New AI Debugging Tool Reveals How LLMs Think

▼ Summary
– Goodfire aims to make AI model building more scientific, addressing the lack of understanding about how large language models like ChatGPT work.
– CEO Eric Ho describes a dominant industry focus on scaling compute and data for AGI, which Goodfire challenges by advocating for a better approach.
– Goodfire pioneers mechanistic interpretability, a technique mapping AI model neurons and pathways to understand internal operations.
– The company uses its tools to reduce model flaws like hallucinations, and packages these techniques into a product called Silico.
– Silico automates interpretability work with agents, though a researcher notes it adds precision to an alchemical process rather than achieving true engineering.
A small startup is challenging the black-box nature of today’s most powerful AI models, arguing that understanding how large language models (LLMs) actually think is the key to building safer, more reliable systems. Goodfire, a company focused on mechanistic interpretability, is releasing a new tool called Silico that aims to turn the trial-and-error process of model development into something closer to precision engineering.
“We saw this widening gap between how well models were understood and just how widely they were being deployed,” says Eric Ho, Goodfire’s CEO, in an exclusive interview with MIT Technology Review. “I think the dominant feeling in every single major frontier lab today is that you just need more scale, more compute, more data, and then you get AGI [artificial general intelligence] and nothing else matters. And we’re saying no, there’s a better way.”
Goodfire joins a small but influential group of players,including Anthropic, OpenAI, and Google DeepMind,pioneering mechanistic interpretability. This technique attempts to map the neurons and pathways inside an AI model to understand exactly what happens when it performs a task. MIT Technology Review recently named mechanistic interpretability one of its 10 Breakthrough Technologies of 2026.
Where many interpretability efforts focus on auditing models after they are trained, Goodfire wants to use the approach earlier in the process. The goal is to design models with greater intentionality from the start.
“We want to remove the trial and error and turn training models into precision engineering,” says Ho. “And that means exposing the knobs and dials so that you can actually use them during the training process.”
The company has already used its techniques to modify LLM behavior, such as reducing the frequency of hallucinations. With Silico, Goodfire is packaging those internal methods into a commercial product. The tool relies on AI agents to automate much of the complex interpretability work.
“Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” Ho explains. “That was kind of the gap that needed to be bridged before this was actually a viable platform that customers could use themselves.”
Not everyone is convinced that Goodfire’s approach represents a true paradigm shift. Leonard Bereska, a researcher at the University of Amsterdam who has studied mechanistic interpretability, acknowledges that Silico looks useful but pushes back on the company’s broader claims.
“In reality, they are adding precision to the alchemy,” says Bereska. “Calling it engineering makes it sound more principled than it is.”
(Source: MIT Technology Review)




