OpenAI and Broadcom Unveil Jalapeño: First Custom AI Inference Chip Targets 50% Cost Reduction
View original source →On June 24, 2026, OpenAI and Broadcom officially unveiled Jalapeño—OpenAI's first custom-designed AI inference chip, built with TSMC manufacturing and Celestica board assembly. This represents OpenAI's most consequential infrastructure move since its founding: building its own silicon for a vertically integrated AI compute stack independent of NVIDIA.
Key points:
• Jalapeño is an Application-Specific Integrated Circuit (ASIC) designed exclusively for LLM inference workloads, architected around specific memory access patterns, compute requirements, and networking characteristics of serving large language models
• The nine-month development cycle—from initial design to tape-out—is described as potentially the fastest advanced semiconductor development ever achieved for a high-performance ASIC of this complexity
• Engineering samples are already executing active workloads in OpenAI's labs, powering GPT-5.3-Codex-Spark at target frequencies
• Initial datacenter deployments with Microsoft Azure and other partners target late 2026, with volume scaling planned for 2027
• The chip aims to reduce inference costs by approximately 50% compared to equivalent NVIDIA GPU deployments for LLM serving workloads
• Broadcom contributed silicon implementation expertise, Tomahawk high-performance networking chips, and connectivity architecture
At OpenAI's scale—billions of queries daily across ChatGPT, Codex, and its API—a 50% inference cost reduction translates to billions in annual operating cost savings and improved unit economics ahead of the IPO.
Why It Matters: Jalapeño signals the end of NVIDIA's near-monopoly on AI compute. Token prices will decline significantly as custom silicon reaches production deployment in 2027, potentially making currently uneconomical use cases—24/7 AI monitoring, high-volume document processing, real-time agentic workflows—economically viable.