AI & TechArtificial IntelligenceNewswireScienceTechnology

Unlocking the Genome with Generative AI

▼ Summary

– AI systems have successfully predicted protein structures and designed functional proteins, but these efforts focus only on proteins and amino acids.
– Biological changes occur at the nucleic acid level, which is complex due to non-coding sequences and redundancy, making genome organization’s relevance to protein function unclear.
– Training AI on bacterial genomes can help predict novel proteins that differ from known examples.
– Bacterial genomes often cluster genes with related functions together, allowing efficient control of biochemical pathways.
– Researchers developed Evo, a genomic language model trained on bacterial genomes to predict DNA sequences and generate novel outputs based on prompts.

Generative AI is now making significant strides in genomics, moving beyond protein structure prediction to directly interpret and generate functional DNA sequences. This shift marks a pivotal moment in computational biology, as it tackles the fundamental code of life rather than just its molecular products. While previous AI systems excelled at forecasting protein shapes and designing useful proteins, they operated at a level removed from biology’s core mechanism, genetic instructions encoded in nucleic acids.

Biology doesn’t create new proteins directly; instead, alterations must originate within the DNA sequence before manifesting as functional proteins. The genome itself presents considerable complexity, filled with non-coding regions, redundant elements, and inherent flexibility. These characteristics made it uncertain whether an AI could effectively learn genomic organization to produce viable proteins.

Recent developments, however, indicate that training AI on bacterial genomes can yield systems capable of predicting proteins, including some with completely novel structures unlike any known examples.

The foundation of this breakthrough lies in a distinctive trait of bacterial genetics: functional gene clustering. Bacteria frequently group genes responsible for related tasks, such as nutrient uptake, sugar metabolism, or amino acid synthesis, close together on the chromosome. In numerous instances, these clustered genes are transcribed as a single, lengthy messenger RNA molecule. This arrangement allows bacteria to regulate entire biochemical pathways simultaneously, optimizing metabolic efficiency.

Leveraging this genomic architecture, a Stanford research team constructed what they describe as a “genomic language model” named Evo. They trained this model on an extensive database of bacterial genomes using an approach comparable to large language models in natural language processing. During training, Evo received sequences and learned to predict the subsequent DNA base, receiving reinforcement for accurate predictions. As a generative model, Evo can accept a starting DNA prompt and produce new, original sequences with controlled variability, meaning identical prompts can generate multiple distinct outputs. This capability opens pathways for designing synthetic genetic elements with tailored functions.

(Source: Ars Technica)

Topics

protein structure 95% bacterial genomes 95% genomic language model 95% ai training 90% protein prediction 90% protein function 90% genome organization 90% nucleic acids 85% gene clustering 85% novel proteins 85%