How does ChatMinerva respond? The challenge of building an Italian ChatGPT
The important thing is to evaluate each llm for what it is, not for what it could never be, a Silicon Valley giant.
If a brick weighs a kilo plus half a brick, how much does a brick weigh? The riddle is an old one, and many experts used it as a test for the first generative AI models, Gpt-3 and the like, which punctually failed. ChatMinerva, a newly-launched Italic chatbot, has this undoubted advantage: it takes us back to that past when we could make fun of generative AI. "The weight of that tile will be exactly 1 kg + 0.5 × 1 kg = 1.5 kg (or 1 500 g). In other words, it has the same weight as twice its mass!", he replies, confidently (the exclamation mark), when the right answer by now even stones, or bricks, know it, and it is two kilos. Chatgpt Instant (a faster version of the current Gpt 5.5 model) answers correctly, and in addition tells us the formula to get there, X=1+X/2, so two (kg). What makes us suffer the most is perhaps the linguistic mispronunciation ('that tile'), which does no credit to a model whose main distinguishing feature should be that it is trained with and for our language, Italian. This is how its creators, the Sapienza NLP research group of La Sapienza University of Rome, led by Professor Roberto Navigli, in collaboration with Babelscape, an academic spin-off founded ten years ago, presented it to the world.
Nor is it fair to be so harsh on a creature that, 'was built with more passion than budget, thanks to the unceasing work of dozens of researchers, PhD students and collaborators who believe in the possibility of creating Italian AI technology from which to build competitive products,' as Navigli put it.
A bit like a homegrown hatchback, made with small, capable forces. You cannot put it on the track with formula one cars such as Gpt, Claude or Gemini. Unfortunately, however, we are used to these; with these it is inevitable to make a comparison.
"It is not surprising that ChatMinerva cannot solve the riddle of the brick, which nobody fails nowadays. We're talking about a model with a number of parameters (connections) several orders of magnitude lower than Gpt and the like," says Antonio Cisternino, an experienced AI researcher at the University of Pisa. ChatMinerva is the direct evolution of Minerva 7B, the large language model launched earlier by the same Sapienza NLP group, with 7 billion parameters, "very few now", says Cisternino. Navigli announces a further version, with 20 billion parameters, for the autumn. Gpt 3, launched in 2020, had 175 billion. OpenAI has not declared these values since then, but independent analyses (by Semianalysis) speak of almost 2 trillion parameters, which the model now uses in a small part in its responses, thanks to efficiency techniques achieved.
ChatMinerva responses suffer from these limitations. "They are more prone to errors - hallucinations - or not following the given instructions," says Cisternino. In our tests: if we ask to write an article on a topic, it does not do so but summarises a news item. If we ask him to summarise a news story instead, he gives us a few lines and does not elaborate on them if we ask him to.

