Possible Alternatives to Address the Contested Issue of Copyrighted Material Training in Generative AI Models

Recently, the United States District Court for the Northern District of California handed Meta and Anthropic significant legal victories by finding that their unauthorized use of copyrighted materials in the development of generative AI constituted fair use.[i] Interestingly, the decisions both largely echoed the U.S. Copyright Office’s (USCO) pre-publication report in concluding that the use of copyrighted materials in training generative AI models may constitute fair use in limited instances, but cautioned the fact-intensive nature of such inquiries. In a previous post, we summarized the USCO’s pre-publication report’s key findings and the potential effects if Taiwan were to adopt similar standards (see here). Despite the favorable outcomes for generative AI providers, the judges in these two cases also clearly indicated that not all of the generative AI training models would meet the requirements of the fair use doctrine. Since the judgments of the court may vary depending on different fact patterns and the judges’ weighing of the fair use factors, looking at alternatives to the fair use doctrine may provide a clearer picture of what Taiwan can do to improve addressing the contested issue of copyrighted materials in generative AI training, thereby solidifying its position as a pioneer in protecting creator rights while promoting AI innovation.

The purpose of this article is to explore two prominent alternatives, AI-focused text and data mining (TDM) exceptions and collective licensing, with the former being more beneficial to AI developers and the latter benefiting copyright holders.

Direct government intervention: TDM exceptions

Given the potential suppression of AI innovation associated with copyright infringement concerns and the uncertainty around what new circumstances qualify as fair use because of the doctrine’s ad hoc application, some governments have taken an alternate route. For instance, Japan has released non-binding interpretation extending already existing TDM exceptions for AI applicability, while Singapore has enacted TDM exceptions with AI development in mind (albeit not explicitly).[ii] The EU’s AI Act, on the other hand, has explicitly incorporated TDM exceptions by referencing a 2019 directive that provides for such exceptions.[iii]

The EU defines TDM as an “automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.”[iv] Historically, the reason for allowing certain TDM practices as exceptions to copyright infringement is that TDM practices may benefit social or cultural development in general. In the context of generative AI development, some jurisdictions, aiming to encourage such development, often regard generative AI as a mere variant of existing TDM and hence extend TDM exceptions to include AI development. However, to ensure the intended benefits do not come at the expense of copyright holders’ interests, such extension should be qualified with explicit prerequisites or limitations. Some examples of such prerequisites and limitations that have been adopted include: lawful access to copyrighted data (Singapore, EU);  use of copyrighted data for specific purposes (Singapore, and specific sections of the EU bills) and creators’ opt-out rights for their copyrighted data being used (most countries that adopt TDM exceptions).[v]  Japan additionally will not recognize the TDM exception where (1) the purpose of the exploitation is for enjoyment of the thoughts or sentiments expressed in the work (such as when a person garners pleasure from reading a book); and (2) the exploitation would unreasonably prejudice the interests of the copyright owner.[vi]

Still, there are criticisms of TDM exceptions. The first, applicable to more conventional TDM practices, is that creating exceptions to copyright infringement for such practices violates the Berne Convention’s requirement that, among other things, excepted uses must not “unreasonably prejudice the legitimate interests of the rightsholder.”[vii] Some claim that AI-applicable TDM exceptions likely violate this requirement by allowing the use of copyrighted works in the development of a potential competitor.[viii] Secondly, it is argued that the opt-out provisions within the TDM exceptions are fundamentally juxtaposed to the heart of copyright law, which is characterized as an opt-in system that grants exclusive rights unless the rightsholder explicitly assigns or licenses the rights to another party.[ix] Opt-out provisions likewise raise practical issues such as the inability to remove a work’s contribution from a model after it has already been used in training.[x]

In short, while TDM exceptions have their clear benefits for AI developers, some may still criticize that they do not appear to be an ideal solution for the rightsholder.

A streamlined approach: collective licensing

To assist copyright holders in enforcing their rights and collecting compensation, collective licensing is commonly adopted in the creative industries where a collective management organization (CMO) administers a large number of rightsholders’ exclusive rights to use their works and conduct licensing transactions on rightsholders’ behalf. This alternative approach to the exploitation of copyrighted materials provides ready access to numerous works with offerings such as blanket licenses and also ensures rightsholders’ remuneration by streamlining the collection of royalties and their distribution back to rightsholders. [xi]

In the context of uses for AI training, several reproductive rights organizations (a subset of CMOs), such as the Copyright Agency in Australia and the Copyright Licensing Agency in the U.K., have already started implementing limited collective licensing options for the use of copyrighted materials in AI development.[xii] While the USCO has floated similar concepts, concerns with administrative burdens and anticompetitive effects have ultimately caused the office to seek further input before giving a clear recommendation to adopt such a model in AI-related uses.[xiii]

Having said the above, at least from a rightsholder’s point of view, the potential for collective administration of rights to achieve successful and sustainable commercialization of their works is clearly established, as it helps lower the barrier for rightsholders (particularly those not affiliated with large, often financially sophisticated industry players such as record labels or publishing companies) to license out their works and enforce their rights against users. While there are also hurdles, such as the ones identified above, further guardrails to ensure market order and enhance market activity such as non-compulsory licensing, prohibition on exclusivity, and requirement of equal treatment among similarly situated licensees may be employed.[xiv] Note, however, that the availability of AI training data through the means of collective licensing (and if so, the quantity of material available) will likely depend on various factors, such as the license fee for which a collective license is offered, and the operational burdens that obtaining a collective license may impose on an AI developer.

Taiwan’s path moving forward

Given the rapid pace with which AI training methods evolve and use cases of copyrighted materials expand, there is still room for Taiwan’s current legal standards—which adhere to the fair use doctrine and, as discussed above and in our previous post, breeds ambiguity—to be supplemented and reinforced. Currently, there is no singular supplemental method that is universally accepted. It would appear that the majority of countries that have addressed this issue have adopted some variant of a TDM exception as a solution, but other countries, like the United States, are hesitant to follow suit. On the other hand, collective licensing, while also discussed in academic circles and among policymakers, has not yet seemed to capture the support needed to prevail. Amid this constantly evolving landscape, rightsholders and AI developers alike are keeping a watchful eye on this situation, and we similarly will continue to closely monitor these developments.

 

Because the content of this article involves information about many countries, to avoid any omissions despite all efforts to verify the information, please refer to the original sources independently and interpret them when citing relevant information.

 

For inquiries on how to best adhere to current copyright law standards, we welcome you to reach out to Ling-ying Hsu at lhsu@winklerpartners.com.

 

Written July 22, 2025 By Ling-ying Hsu, Chi-Hsien Nieh and Nicole Evangelista.

 

[i] Megan K. Bannigan et al., Anthropic and Meta Decisions on Fair Use, Debevoise & Plimpton (June 26, 2025) (https://www.debevoise.com/insights/publications/2025/06/anthropic-and-meta-decisions-on-fair-use).

[ii] European Alliance for Research Excellence, Singapore’s new Text and Data Mining exception will support innovation in the digital economy (Jul. 20, 2021) (https://eare.eu/hello-world-11/); Bryan Tan, Text and data mining in Singapore (Feb. 5, 2024) (https://www.reedsmith.com/en/perspectives/ai-in-entertainment-and-media/2024/02/text-and-data-mining-in-singapore); Legal Subcommittee under the Copyright Subdivision of the Cultural Council, General Understanding on AI and Copyright in Japan, 5-9 (May 2024) (https://www.bunka.go.jp/english/policy/copyright/pdf/94055801_01.pdf)

[iii] Directive EU 2019/790 of the European Parliament and of the Council of 17 Apr. 2019 on Copyright and Related Rights in the Digital Single Market and Amending Council Directives 96/9/EC and 2001/29/EC, art. 3, 2019 O.J. (L. 130/92) (https://eur-lex.europa.eu/eli/dir/2019/790/oj/eng).

[iv] Id.

[v] Norton Rose Fulbright, Infringement risk relating to training a generative AI system, Norton Rose Fulbright (May 2025) (https://www.nortonrosefulbright.com/en-hk/knowledge/publications/ef8d8cce/infringement-risk-relating-to-training-a-generative-ai-system); Matthew Sag & Peter K. Yu, The Globalization of Copyright Exceptions for AI Training, Emory L.J. (forthcoming. 2025) (manuscript at 21-28) (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4976393); Joel Bock et al., AI and intellectual property rights, Dentons (Jan. 28, 2025) (https://www.dentons.com/en/insights/articles/2025/january/28/ai-and-intellectual-property-rights).

[vi] See Supra ii, Legal Subcommittee under the Copyright Subdivision of the Cultural Council, 5-9 (May 2024).

[vii] Berne Art. 9(2) (https://www.law.cornell.edu/treaties/berne/9.html).

[viii] See Supra v, Matthew Sag & Peter K. Yu at 14; U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training, 83(pre-publication version, May 2025) (https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf) (citing Professional Photographers of America Initial Comments at 10; CISAC Initial Comments at 3–4; International Authors Forum Initial Comments at 1).

[ix]  U.S. Copyright Office, supra note viii at 101-02; 105.

[x] Id.

[xi] Int’l Confederation of Societies of Authors and Composers (CSIAC), The History of Collective Management (2015) (https://www.cisac.org/Newsroom/expert-articles/history-collective-management).

[xii]Anita Huss-Ekerhult and Antonios Baris, Pro-Copyright, Pro-AI: The Power of Collective Licensing, 48 (4) Columbia J. L. & Arts, 430-433 (2025) (https://journals.library.columbia.edu/index.php/lawandarts/article/view/13923).

[xiii] U.S. Copyright Office, supra note viii at 103-04.

[xiv] See Id at 104 (Finding that compulsory licensing within collective licensing would harm rightsholders); See also United States v. Am. Soc’y of Composers, Authors, Publishers, No. 41-1395(WCC), 2001 WL 1589999, at *3-8 (S.D.N.Y. June 11, 2001) (https://www.westlaw.com/Document/I8becdd8b53e611d9b17ee4cdc604a702/View/FullText.html?transitionType=Default&contextData=(sc.Default)&VR=3.0&RS=cblt1.0) (Providing requirements for CMO to limit potential overreach, abuse, or discriminatory practices).