Fintech

Artificial intelligence: can everything on the web be taken freely?

The CEO of artificial intelligence at Microsoft, Mustafa Suleyman, asked himself. Here is the copyright defence of the Ai Gen giants.

by Alessandro Longo

4' min read

4' min read

Everything on the open web can be taken freely, 'anyone can copy it, use it for new creations and reproductions'. Such a phrase, as a 1990s utopian of the 'free web', would now perhaps also make a smile on the lips of a young visionary start-up entrepreneur or anarcho-socialist. What strikes the expert community and the copyright industry now is that it is the CEO of artificial intelligence at Microsoft, Mustafa Suleyman, who is speaking.

Note that OpenAi and the other companies that created AI models did this, implicitly: they treated web content as no man's land for algorithm training.

Loading...

But now this behaviour is made explicit in declared ideology. And it is all the more striking that it is not a start-up (such as OpenAi) but a company that has been the legal face of innovation for the past twenty years, collaborating for example with the music industry in the fight against piracy and always at the forefront of compliance with local regulations (when it comes to data processing in the cloud, for example).

Suleyman then goes so far as to say that it is perhaps permissible to use that data even if publishers explicitly do not want to. "There is a separate category where a website, or a publisher, or a news organisation has explicitly said, 'do not scrap or crawl for any other reason than indexing, so that other people can find this content'. This is a grey area, and I believe this issue will make its way through the courts'. It is emblematic that OpenAI is ignoring this will of the publishers (opt-out option) after having granted it, as emerged a few days ago from an investigation by BusinessInsiders.

One might think that Suleyman is a high-flying visionary and that his statements - in an interview with the American Cnbc - do not represent the company, even though he is formally the head of AI at Microsoft. Suleyman was co-founder and ceo of Inflection AI before joining Microsoft. Previously, he was one of the founders of DeepMind, a leading artificial intelligence company, and vice president of AI at Google. He published a visionary and optimistic book on AI (The Coming Wave. Artificial Intelligence and Power in the 21st Century, Garzanti 2024).

But that thought, which in words seems so radical, does not, on closer inspection, contradict the actions of Microsoft and Google (not just of start-ups, in short). Already last year, Microsoft declared that it will give free legal protection to commercial (non-consumer) users of Copilot in copyright infringement lawsuits concerning content generated with these AI services. Google did the same shortly afterwards and even more, extending protection also to lawsuits concerning possible copyright infringement for data used in AI training (for the products Duet AI in Workspace, Duet AI in Google Cloud, Vertex AI Search, Vertex AI Conversation, Vertex AI Text Embedding API/Multimodal Embeddings, Visual Captioning/Visual Q&A on Vertex AI and Codey API).

Both companies exclude legal protection for intentional copyright infringements, which might emerge, for instance, in the prompts used or if customers remove the protection filters present in the services by default.

This legal protection means that companies are reasonably certain to win lawsuits or that any settlements will cost less than the profits they can make by reassuring potential customers. With the larger companies that are copyright holders - the most fearsome in court - already making agreements for licences to use the data.

In any case, it means that big tech thinks it is winning this game. And that in essence that Suleyman's thinking is correct; that the world - the courts, case law, the law - will push through the concept of freedom of training with open web content.

Yet even on the other side of the fence there is certainty. The many publishers who have sued AI companies; ultimately all record companies are certain that training with unauthorised data scraping is theft. 

Suleyman says - like OpenAi and other AI companies - that the legal basis they can rely on is fair use, a concept that has no equivalent in Europe. In any case, "claiming that fair use can come into play to justify this legal interpretation is extremely difficult, especially if the purpose of the use is the -declared- profit of the companies," explains Alfredo Esposito, of a law firm of the same name specialising in copyright and digital. "Among the criteria for the application of fair use is the purpose and character of the use, which clearly tends to be broad-meshed in cases of study, research and dissemination," he adds.

And there is not only copyright: according to some - like our Garante - even privacy regulations are violated by scraping because the data subtracted also include personal data. So in May, in a measure, it asked site and platform operators to subtract personal data processed by third-party bots, with certain indicated techniques, 'which, although not exhaustive either in method or result, may contain the effects of scraping aimed at training generative artificial intelligence algorithms'.

All in all, generative AI seems to have brought us to a breaking point between innovation and legal protection of different interests, with positions now opposing each other in the legal interpretation. The outcome of the clash is hardly predictable, but it seems unlikely that generative AI will be stuck in the courts. Perhaps a meeting point will be found, leading to an international licensing system; no longer entrusted to free bargaining (cas is the case now) but with rules coordinated by standards (a bit like Europe wants to do with fair compensation).

We will find out in the coming months.

Copyright reserved ©
Loading...

Brand connect

Loading...

Newsletter

Notizie e approfondimenti sugli avvenimenti politici, economici e finanziari.

Iscriviti