RDLRDLEcosystem
News

Choosing a License for Your Research Data: CC BY, CC0, and Beyond

Without a license, your dataset is legally ambiguous and hard to reuse. This guide compares CC BY, CC0, and other Creative Commons options so you can pick the right one for open research data.

Why an unlicensed dataset is a problem

When you publish data without an explicit license, potential reusers face a legal grey zone: they cannot be sure what they are allowed to do with it, so many will not risk using it at all. A clear license is what turns "available" data into genuinely reusable data — and it is a core part of making data FAIR. The most widely used licenses for research come from Creative Commons (CC).

The two most relevant options for data

CC0 — public domain dedication

CC0 waives as many rights as legally possible, effectively placing the work in the public domain. Reusers can copy, adapt, and build on the data — even commercially — with no obligation to ask or attribute. For datasets, CC0 is widely recommended, and repositories such as ETH Zurich explicitly advocate CC0 or a public-domain dedication for research data. It removes friction and maximises reuse.

CC BY — attribution required

CC BY allows all the same uses, including commercial, but reusers must credit you. It is a natural choice when attribution matters to you, though for large aggregated datasets, attribution stacking can become impractical — which is part of why CC0 is often preferred for pure data.

Other CC variants

Creative Commons also offers NC (non-commercial), ND (no derivatives), and SA (share-alike) add-ons. These add restrictions and, while sometimes appropriate, they reduce how freely data can be combined — so use them deliberately, not by default.

Two practical rules

1. Prefer version 4.0. For new work, choose CC licenses version 4.0. It is international and better aligned with database rights, which is exactly what datasets need. 2. Decide carefully — it is durable. Once you apply a CC license, it lasts for the life of the copyright and cannot be arbitrarily revoked. That permanence is a feature (reusers can rely on it) but it means the choice deserves a moment's thought.

The bottom line

For most open research data, CC0 maximises reuse, and CC BY 4.0 is the go-to when you want credit. Whatever you choose, choose something — an explicit license is the difference between data that sits unused and data that travels.