OpenAI and Broadcom Unveil Jalapeño: First Custom AI Inference Chip Targets 50% Cost Reduction
View original source →On June 24, 2026, OpenAI and Broadcom officially unveiled Jalapeño — OpenAI's first custom-designed AI inference chip. Built in a nine-month design-to-tape-out cycle with manufacturing by TSMC and board assembly by Celestica, this represents OpenAI's most consequential infrastructure move since its founding.
Key details:
• Jalapeño is an Application-Specific Integrated Circuit (ASIC) designed exclusively for LLM inference workloads, delivering substantially better performance per watt than general-purpose GPUs
• The nine-month development cycle is described by analysts as potentially the fastest advanced semiconductor development ever achieved for a high-performance ASIC of this complexity
• Engineering samples are already executing active workloads in OpenAI's labs, powering the GPT-5.3-Codex-Spark model at target frequencies
• Initial datacenter deployments with Microsoft Azure and other partners are targeted for late 2026, with volume scaling planned for 2027 to support gigawatt-scale AI infrastructure
• The chip aims to reduce inference costs by approximately 50% compared to equivalent NVIDIA GPU deployments
• Broadcom contributed silicon implementation expertise and Tomahawk high-performance networking chips
At OpenAI's scale of inference — billions of queries daily across ChatGPT, Codex, and its API — a 50% inference cost reduction translates to billions of dollars in annual operating cost savings and significantly improved unit economics ahead of the IPO.
Why It Matters: This signals the end of NVIDIA's near-monopoly on AI compute. Token prices will decline significantly as custom silicon reaches production deployment in 2027, making currently uneconomical use cases — 24/7 AI monitoring, high-volume document processing, real-time agentic workflows — economically viable.