AI Tools Still Struggle With Biology Basics

▼ Summary
– Predicting gene activity changes is complex, as altering one gene can affect only its own mRNA or, if it regulates others, dozens of genes.
– When two genes are altered, effects can be additive if unrelated, or show unexpected changes if their functions overlap.
– Researchers use CRISPR and RNA sequencing (Perturb-seq) to study gene alteration effects and gather data for AI training.
– AI foundation models were trained on data from 100 single-gene and 62 double-gene activations to predict resulting gene activity changes.
– The AI’s predictions were compared to simple models that assumed no changes or purely additive effects.
Despite rapid advancements in artificial intelligence, current AI models still face significant challenges when tackling fundamental biological problems like predicting gene activity changes. The complexity of genetic interactions continues to outpace even the most sophisticated machine learning systems, revealing critical gaps in their understanding of cellular processes.
When scientists modify a single gene using tools like CRISPR, the outcomes can range from straightforward to wildly unpredictable. Sometimes only that gene’s messenger RNA changes, while other times regulatory proteins trigger cascading effects across dozens of genes. Cellular metabolism might shift entirely, creating ripple effects throughout the genetic network. The variables multiply exponentially when examining multiple gene interactions, where combined effects can amplify, cancel out, or produce entirely novel patterns of gene expression.
Researchers recently tested AI’s predictive capabilities using Perturb-seq data, a cutting-edge method that maps genetic changes after CRISPR modifications. Teams trained foundation models on experimental data involving both single and dual gene activations, 100 individual gene cases and 62 paired interactions. The AI systems then attempted to forecast outcomes for an additional 62 gene pairs, competing against two deliberately simplistic benchmarks: one predicting no changes whatsoever, and another assuming purely additive effects where combined gene activations equal the sum of their individual impacts.
The results exposed fundamental limitations in how AI processes biological complexity. While the models outperformed the naive benchmarks, their predictions frequently missed critical non-linear interactions and unexpected genetic responses. This suggests current architectures struggle to capture the nuanced, context-dependent nature of cellular systems, where genes don’t operate in isolation but within dynamic, interconnected networks. The findings highlight how biological systems defy the straightforward pattern recognition that AI excels at in other domains, demanding more sophisticated approaches to model living systems accurately.
(Source: Ars Technica)