Artificial IntelligenceHealthNewswireQuick ReadsTechnology

AI matches or beats doctors in two new medical studies

▼ Summary

– Two AI systems, Mira and Amie, matched or beat doctors in diagnostic accuracy and treatment planning, but were tested on simulated, not real, patients.
– Mira achieved 87% diagnostic accuracy across 500 simulated emergency cases, outperforming a panel of doctors at 78%, especially on conditions with clear test results.
– Amie matched UK GPs on clinical reasoning and produced treatment plans more aligned with official guidelines, performing better on tricky medication decisions.
– The results are oversold because tests used clean, text-only case notes without physical exams, and Amie’s guidelines-based scoring favored the AI over doctors.
– Researchers compare the AI to an aircraft autopilot, handling routine work while ultimate responsibility remains with physicians, as AI is already being integrated into real health systems.

Two artificial intelligence systems have matched, and in certain cases outperformed, doctors when diagnosing patients and planning their treatment. There is one major caveat, however: none of the patients were real. The findings, published this week in Nature, represent some of the most compelling evidence to date that specialist medical AI is approaching the skill level of human clinicians. They also serve as a textbook example of why a flashy headline does not necessarily translate to real-world medical practice.

What the studies found

The first system, called Mira, was developed by academic researchers in Germany. When provided with a simulated medical record, it can select from over 85,000 possible actions, including tests, prescriptions, and even hospital admissions. Across more than 500 emergency department cases, it achieved a diagnostic accuracy of roughly 87 percent, compared to 78 percent for a panel of six doctors. It performed best on conditions with clear test results, such as pancreatitis and appendicitis.

The second system, Amie, is a product of Google and runs on its Gemini model. Tested against 21 UK general practitioners across 100 multi-visit cases, it matched them on clinical reasoning and produced treatment plans that adhered more closely to official guidelines. On a benchmark for complex medication decisions, it actually came out ahead.

Why the headline oversells it

Reading the fine print quickly reveals a more nuanced picture. Both systems were tested on simulated patients using clean, text-only case notes. There were no physical exams, no scans, and no reading of a patient’s tone or body language, all elements that real doctors depend on every day. Independent experts also raised additional concerns. Amie was rewarded for following guidelines, but doctors are not strictly bound to those guidelines, which makes the comparison somewhat uneven. Mira, meanwhile, ordered roughly twice as many tests as the doctors, and ordering more tests can artificially inflate an accuracy score.

Furthermore, the models are already outdated. The versions tested are roughly two years old, and the researchers themselves note that this arguably makes them weaker than what exists today, not stronger.

Autopilot, not replacement

The researchers are careful not to overstate what this means. Jakob Kather, a co-developer of Mira, compared the AI to an aircraft’s autopilot. It can take over routine tasks, he said, but “ultimate responsibility will always remain with the physicians.” That is the likely future, and it is already arriving. AI is being integrated into real health systems to ease workforce shortages, reduce administrative burdens, and serve patients as consumer health advisers. The Nature studies do not prove that doctors are obsolete. They show that, in a simulator, a machine can now reason like one, which is genuinely impressive but still a long way from a real hospital.

(Source: The Next Web)

Topics

medical ai 95% ai benchmarks 92% simulated patients 90% diagnostic accuracy 88% ai vs. doctors 86% treatment planning 84% limitations of ai 82% clinical reasoning 80% mira ai system 78% amie ai system 77%