Italy, Europe and the rest of the world on the hunt for nationalist chatbots
It's not just the US and China: from Italia 9B to Mistral Ai here are the large open source language models that have sprung up on our continent.
3' min read
3' min read
Stealing work from humans, reducing manual operations into automatisms of a few seconds, even zeroing creativity with a simple text prompt. Among the myriad negative consequences that generative artificial intelligence brings, many forget one: linguistic flattening. Those who have been using tools such as ChatGPT or Gemini for some time will not struggle to recognise a portion of text created with their patterns: bulleted lists, short and often repeated sentences. Translation from one idiom to another ends up aligning the content even more to adaptations that are often lacking in national characterisation, impersonal and, in short, uninteresting. And this is the reason why many countries in the world, as well as institutions and universities, have moved to create their own LLMs, native Large Language Models, tailor-made to resemble to all intents and purposes the customs and habits of a people, a sort of super-man aware of the space and time he is living in.
We were not the first, but neither were we the last. A few days ago, 'Italia' became available, the model developed by iGenius and 'trained' by Cineca on a local dataset, i.e. composed of Italian words and trained on 9 billion parameters and 50 thousand vocabulary tokens, with over 1,000 billion individual words to be associated for training. Little or a lot? For comparison, the old Gpt-3 operated on 175 billion parameters while Gpt-4 on about 100 trillion.
More than the dataset is the computing power
.It is clear that it is not so important to have an inordinate amount of parameters when the ability to make inference, i.e. to transform data into logical sequences, is so important. A process that must be performed by a machine, which is the basis of the algorithm or cluster of algorithms. Musk, before launching Grok on X thoughtfully bought himself a bunch of Nvidia GPUs while Microsoft, in late 2023, unveiled Azure Maia 100 and Cobalt 100, the first two chips designed for AI-powered cloud infrastructure. As if to say: we built the car, we have the drivers, but there is a lack or shortage of workshops, which would really realise the concept of AI 'sovereignty'.
Italy is made
.Released in open source mode, Italia aims to be an evolutionary tool for research and businesses across the country. Downloadable on the iGenius website and other AI product development platforms, Italia is trained on a dataset of text and code in Italian from a variety of sources, including Wikipedia, books, journal articles and source code.
It can be used via a web interface or an API. The former is simple to use and requires no programming knowledge. The API is more complex, but offers more flexibility and control. Editoriale nazionale is the first of the partners that wanted to contribute to the training of Italia, opening its historical archive of articles, but in the future it is expected that others will want to join in.
-kukD-U57565086886DPG-1440x752@IlSole24Ore-Web.jpg?r=650x341)
