Frankfurt

The biggest data theft in history is underway

At the Frankfurter Buchmesse, attempts are being made to take a stand against generative artificial intelligence, which would commit huge copyright violations to acquire its capabilities and provide the required answers

17 October 2024

Il presidente dell’Associazione tedesca degli editori e dei librai Karin Schmidt-Friderichs alla Fiera del libro di Francoforte. (Photo by Kirill KUDRYAVTSEV / AFP)

3' min read

"The capacity of artificial intelligence systems is made possible by the greatest data theft in history!": Karin Schmidt-Friderichs, president of the German Publishers & Booksellers Association, used these words at the opening press conference of the Frankfurter Buchmesse, the world's most important book fair. "Creativity must be remunerated," he continued, "Whereas copyrighted texts and images have been used and continue to be used millions of times over without paying royalties and without the consent of the authors as training material for artificial intelligence. This is unacceptable. Clear rules are needed. The European Ai act is a starting point but does not yet address many issues, we are only at the beginning of a long debate that we must have not only as an industry but also as a society'.

The threat posed by artificial intelligence (AI) to the publishing, printing and other creative industries is one of the recurring themes of this fair, which hosts several conferences on the subject, such as the one organised by the International publishers association (Ipa) where Silke von Lewinski, from the Max Planck Institute for Innovation and Competition in Munich as well as lecturer at the University of Zagreb and legal expert for the European Commission, and Scott Zebrak, American copyright lawyer and founding partner of Oppenheim + Zebrak, LLP ('O+Z'), spoke.

Von Lewinski pointed out that, although no one knows precisely how Llm (Large language models) work, studies have recently been published showing that generative AI systems continuously make use of protected material, reproducing it countless times both when downloading it for practice and when re-processing it to provide the required answers. He went on to explain that the European Ai Act is only a first step to regulate the sector, and that it is not effective in protecting copyright.

"It is designed to protect the security of products that use AI, not to protect copyright," he said, explaining that although copyright is also discussed, it is done in an incomplete manner, there has not been adequate discussion on the subject, for instance with regard to the 'text and data mining' exception, which, according to many experts, would not be applicable in the case of generative artificial intelligence, because these are not automatic techniques to analyse digital data, i.e. to extract statistics and information useful for research, but systems that use texts to produce similar texts.

Zebrak explained that there are about 25 large lawsuits currently pending in the US involving large language models, which copy the content with which they are 'fed' many times over, either when they 'ingest' it, when they process and select it, or when they reassemble it to provide a response. For instance, the lawsuit brought by the New York Times against OpenAI and Microsoft for infringing copyright law with ChatGpt and Copilot, with the first judgments expected by the end of the year or the beginning of 2025.

Zebrak went on to emphasise the importance, for those who want to understand how the American legal system is shaping up, of the 2023 Supreme Court ruling that Andy Warhol infringed photographer Lynn Goldsmith's copyright when he created a series of silk images based on a photo of musician Prince taken by Goldsmith. 'Adding expressions to a work is not changing it,' Zebrak explained.

Not only that, the fact that one can ask these instruments to write a text in the style of a certain writer, or music in the style of a certain musician, or a photograph in the style of a certain photographer would clearly show that what is being copied is also the style, the expression.

Both experts agreed that the legal framework is currently unclear, and that publishers should lobby to defend their interests. One way to clarify the situation a bit more would be to file lawsuits, so as to force the courts to rule. Another way, strongly suggested, is to try to sell utilisation licences, for all the processes in which texts are used by Llm: for input, training and output, because this shows that there is indeed a market. According to them, it is important that authors and publishers enforce their rights as soon as possible, because if too much time passes, it becomes more and more difficult.

Lara Riccivicecaposervizio curatrice delle pagine di letteratura e poesia
Luogo: Milano e Ginevra
Lingue parlate: Inglese e francese correntemente, tedesco scolastico
Argomenti: Letteratura, poesia, scienza, diritti umani
Premi: Voltolino, Piazzano, Laigueglia, Quasimodo
Scheda autore
Trust project