Artificial intelligence: the code of life in the hands of algorithms
The Evo2 model treats the genome as a language and generates sequences that never existed, but creating functioning organisms remains a huge challenge
Artificial intelligence is now trying to write the code of life. No longer just analysing or modifying existing DNA, but generating entire genomes from scratch. This is the leap suggested by Evo2, a genomic language model described in Nature that opens up a prospect hitherto confined to science fiction: designing artificial organisms from sequences created by an algorithm.
Evo2 works similarly to the language models used for human speech, but instead of words it processes nucleotides, the letters of DNA. It has been trained on around 9 trillion nucleotides from 128,000 genomes of different species, living and extinct, and can read, interpret and generate Dna, Rna and protein sequences. "Our development of Evo1 and Evo2 represents a key moment in the emerging field of generative biology," explains Patrick Hsu, co-founder of the Arc Institute, a non-profit research organisation based in Palo Alto, California, focused on accelerating biomedical discoveries. "Machines are beginning to read, write and think in the language of nucleotides.
The model was developed by the Arc Institute and Nvidia, in collaboration with Stanford and Berkeley universities. Unlike the previous version, which was mainly trained on unicellular organisms, Evo2 also includes genomes of multicellular organisms, including humans and plants. This enables it to process sequences up to a million nucleotides long and to grasp relationships between very distant parts of the same genome.
The potential is not only in synthetic biology. In tests with variants of the Brca1 gene, associated with breast cancer, the system has achieved over 90% accuracy in distinguishing benign from potentially dangerous mutations. Such tools could speed up research into genetic diseases and reduce the number of experiments needed on cells or animals.
The scientists also used Evo2 to design several complete genome sequences, including one inspired by the bacterium Mycoplasma genitalium, one of the organisms with the smallest known genome. Simulations indicate that about 70% of the predicted genes appear plausible. But this is not enough. 'You can't design life at 70 per cent,' observes synthetic biologist Nico Claassens of Wageningen University in the Netherlands. 'You can do it on a computer, but it won't work in a cell. The problem is that a genome only works if every element is in the right place. It only takes one essential gene to be missing or incorrectly organised for the entire biological system to collapse. 'Assessing whether a genome looks correct and checking whether it actually works are two different things,' points out Maciej Wiatrak of the University of Cambridge.

