OpenAI’s New Coding Model Skips Nvidia, Uses Compact Chips

▼ Summary
– OpenAI released its GPT-5.3-Codex-Spark coding model, its first production AI model to run on non-Nvidia hardware, using chips from Cerebras.
– The new model is designed for speed, generating code at over 1,000 tokens per second, which is roughly 15 times faster than its predecessor.
– It is a text-only model tuned specifically for coding tasks and is available as a research preview to ChatGPT Pro subscribers through various interfaces.
– OpenAI claims Spark outperforms its older GPT-5.1-Codex-mini on software engineering benchmarks while completing tasks much faster, though these claims lack independent validation.
– This release marks a significant speed leap over OpenAI’s previous fastest models on its own infrastructure, such as GPT-4o, which deliver far fewer tokens per second.
OpenAI has introduced a new coding model that marks a significant departure from its reliance on Nvidia hardware. The company’s latest release, GPT-5.3-Codex-Spark, runs on chips from Cerebras and is reported to generate code at a remarkable speed exceeding 1,000 tokens per second. This performance represents a substantial increase, estimated to be roughly fifteen times faster than its predecessor. For comparison, Anthropic’s Claude Opus 4.6, a larger and more capable model, achieves about 2.5 times its standard speed in a new premium fast mode, which equates to approximately 170 tokens per second.
Sachin Katti, OpenAI’s head of compute, highlighted the partnership, stating that Cerebras has been an excellent engineering collaborator and that the company is enthusiastic about adding fast inference as a new platform capability. The model is currently available as a research preview to ChatGPT Pro subscribers, who pay a monthly fee of two hundred dollars. Access is provided through the Codex application, a command-line interface, and a Visual Studio Code extension. OpenAI is also granting API access to a select group of design partners. At launch, the model features a 128,000-token context window and is designed exclusively for text processing.
This new release is based on the full GPT-5.3-Codex model that OpenAI introduced earlier in the month. While the comprehensive version tackles complex, agentic coding tasks, Spark has been specifically optimized for raw speed rather than depth of knowledge. It was built as a text-only system and fine-tuned for coding purposes, distinguishing it from the general-purpose capabilities of the larger GPT-5.3 model.
According to OpenAI’s internal testing, Spark demonstrates strong performance on established software engineering benchmarks. The company reports that on evaluations like SWE-Bench Pro and Terminal-Bench 2.0, the new model outperforms the older GPT-5.1-Codex-mini while completing tasks in a significantly shorter timeframe. It is important to note that independent validation of these performance figures has not been publicly shared. Historically, Codex’s speed has been a point of criticism; in a previous test where multiple AI coding agents were tasked with building Minesweeper clones, Codex took nearly twice as long as Anthropic’s Claude Code to produce a functional game.
The introduction of this model intensifies the ongoing competition among coding agents. The reported throughput of 1,000 tokens per second constitutes a major advancement over what OpenAI has previously delivered using its own infrastructure. Independent benchmarks from Artificial Analysis indicate that the company’s fastest models operating on Nvidia hardware fall well below this new threshold. For instance, GPT-4o delivers around 147 tokens per second, the o3-mini model reaches about 167, and GPT-4o mini operates at approximately 52 tokens per second.
(Source: Ars Technica)





