OpenAI and Broadcom unveil new chip for large-scale AI inference

▼ Summary
– OpenAI and Broadcom announced a new chip called Jalapeño, designed specifically for large language model inference in data centers.
– The chip is an ASIC built from scratch for LLM inference, based on insights from OpenAI researchers and informed by OpenAI’s future model roadmap.
– The design and production of Jalapeño took nine months, and the companies describe it as the first generation in a long-term project.
– The chip is more specialized for current LLM needs than existing data center inference systems.
– Early testing suggests Jalapeño will deliver substantially better performance per watt than current state-of-the-art, with a detailed technical report expected in the coming months.
OpenAI, the creator of ChatGPT and Codex, has partnered with semiconductor veteran Broadcom to introduce Jalapeño, a new chip engineered for large-scale AI inference in data centers. This custom silicon is the first in a planned series of chips, with both companies emphasizing it marks the beginning of a long-term development cycle.
Broadcom describes Jalapeño as an Application-Specific Integrated Circuit (ASIC) built from the ground up for large language model inference. The design process drew on “detailed insights” from OpenAI’s research team and was aligned with the company’s own product and model roadmaps. Development and production were completed in just nine months.
The chip is tailored to meet the specific computational demands of modern LLMs, offering a more specialized solution than the general-purpose hardware currently used in most inference systems. According to OpenAI, “early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art.” However, the company cautions that performance measurements are still ongoing, and a “detailed technical report will be presented in the coming months.”
This announcement signals a strategic push to optimize AI infrastructure, with Broadcom and OpenAI aiming to refine the chip across future generations for increasingly efficient data center operations.
(Source: Ars Technica)




