Back Issues Online
Back Issues Online

by Daniel Spichtinger (Ludwig Boltzmann Gesellschaft / University of Vienna/independent researcher)

Creative Commons licences underpin the reusability of research data and are a key enabler of Open Science, yet the rise of large-scale AI training is testing their limits. This article, based on and updating a recent book chapter [1][LC1.1][DS1.2], examines the legal landscape in the European Union and the United States.

CC Stickers, Illustration by Kristina Alexanderson on Flickr.

Licensing plays a central role in ensuring reusability: without clear, machine-readable licenses, research outputs risk becoming technically interoperable but legally unusable, undermining reproducibility and trust, which are at the core of Open Science. [DS2.1]In the context of publicly funded research, Creative Commons (CC) licenses provide a standardised framework that enables predictable and legally secure reuse. Therefore, research funders such as the European Commission require CC licences (e.g. in Horizon Europe) as a funding condition and consequently many researchers provide CC-licences for their research outputs (most commonly CC-BY).

Litigate or legislate - US and EU perspectives 
CC licences were designed before the recent wave of commercial AI systems using large language models emerged. The question that has thus informed my work is: can CC-licenced research output be legally used by AI systems to train them? This is further complicated by different legal regimes for the use of such data in AI training in the United States and the European Union. These developments raise important questions for Open Science, as legal uncertainty around data use may affect the willingness of researchers to share data openly.

In the US, the fair use doctrine may permit the use of copyrighted material for AI training if the use is deemed “transformative.” Given that Creative Commons licences are a part of the copyright system this also applies to them. Invoking the fair use doctrine, companies could therefore ignore CC-licence attributions, such as non-commercial restrictions. In 2019, IBM researchers used over 1 million CC-licensed Flickr photos to train a facial recognition system without notifying photographers [2].

However, whether the use of data by AI companies really constitutes fair use is legally contested, with several court cases still pending. It appears, however, that companies having invoked fair use prefer to settle outside court rather than waiting for a definitive decision by the court. For example, the New York Times has signed a licensing deal with Amazon for AI use of its content.

The EU takes a more structured approach. The 2019 Directive on Copyright in the Digital Single Market[LC3.1][DS3.2] [3] introduced text and data mining (TDM) exceptions: Article 3 permits TDM for non-commercial scientific research, while Article 4 allows commercial TDM provided that rights holders have not opted out. Both of these options could therefore also be used to circumvent CC licencing provisions. The EU’s AI Act[LC4.1][DS4.2],[L1] which entered into force in August 2024, builds on this framework. Article 53(1)(c) requires providers of general-purpose AI models to identify and respect opt-out reservations, even if they are based outside Europe.

From August 2025, AI providers must have copyright compliance policies in place. A General-Purpose AI Code of Practice[LC5.1] [L2], published in July 2025, offers further guidance, including standardised machine-readable opt-out mechanisms. However, The EU’s AI Office has stated that during its first year (until August 2026) it will not consider providers to have broken their commitments and will not adopt measures against them if they do not fully implement all commitments immediately after signing the Code [L3].

While these frameworks aim to balance innovation and rights, they also introduce complexity that may challenge the straightforward reuse of openly licensed research outputs.

Can AI-Generated Research Output Be CC-Licensed?
A related question concerns whether AI-generated outputs (inter alia generated by researchers) can themselves be copyrighted and thus CC-licensed. In 2023, the US Copyright Office ruled on Zarya of the Dawn, a graphic novel with AI-generated images: while the human-authored text was copyrightable, the AI-generated images were not, due to insufficient human creative involvement. Similarly, under EU law, originality requires a work to reflect the author’s own intellectual creation. Outputs generated without sufficient human input therefore fall outside copyright protection and effectively enter the public domain; therefore they cannot be CC-licensed. Researchers using AI tools should therefore carefully document their creative contributions to maintain any claim to authorship. This has implications for Open Science, as the status of AI-generated outputs affects their reuse, attribution, and integration into open research workflows.

Emerging responses and outlook
Several initiatives are emerging to address creative commons licencing in the age of AI. Most notably, Creative Commons itself has developed “preference signals” – machine-readable metadata tags indicating whether CC-licensed content may be used for AI training. The EU AI Act’s extraterritorial reach could turn these signals into globally binding indicators, much as the GDPR has influenced many personal data protection standards around the world. 
Meanwhile, the US remains without comprehensive federal AI legislation. Executive Order 14,179 (January 2025) and the America’s AI Action Plan (July 2025) emphasise deregulation and competitiveness, while proposals requiring copyright disclosure in AI training datasets remain stalled in Congress. 
CC licences remain important for ensuring that research outputs[LC6.1][DS6.2] are not only technically interoperable but also legally reusable, a cornerstone of Open Science. Adapting them for AI will be key to preserving the culture of open science while safeguarding creators’ rights.

Links: 
[L1]: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng 
[L2] https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai 
[L3] https://www.cliffordchance.com/insights/resources/blogs/ip-insights/2025/10/copyright-compliance-under-the-eu-ai-act-for-gpai-model-providers.html 

References:
[1] D. Spichtinger, “Fit for Purpose? Creative Commons Licensing for Research Data in the Age of Artificial Intelligence,” in Data Quality Matters. London, U.K.: IntechOpen, 2025. doi: 10.5772/intechopen.1013402.
[2] R. Merkley, “Use and fair use: Statement on shared images in facial recognition AI,” Creative Commons Blog, Mar. 13, 2019. [Online]. Available: https://creativecommons.org/2019/03/13/statement-on-shared-images-in-facial-recognition-ai/ 
[3] European Parliament and Council of the European Union, "Directive (EU) 2019/790 of 17 April 2019 on copyright and related rights in the Digital Single Market," Official Journal of the European Union, L 130, pp. 92–125, May 17, 2019. [Online]. Available: https://eur-lex.europa.eu/eli/dir/2019/790/oj/eng 

Please contact:
Daniel Spichtinger
Ludwig Boltzmann Gesellschaft/University of Vienna/Independent Researcher, Austria
This email address is being protected from spambots. You need JavaScript enabled to view it. 

 

Next issue: July 2026
Special theme:
E-values: Statistical Testing for the 21st Century
Call for the next issue
Image ERCIM News 144 cover
This issue in pdf

 

Image ERCIM News 144 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed