OpenAI doubles frontier token pricing as DeepSeek open-weights a Mixture-of-Experts architecture at one-ninth the cost
The comfortable middle tier for coding agents has vanished, forcing developers to route between a $30 integrated enterprise bundle and a $3.48 commodity infrastructure layer.
A nine-to-one divergence in output pricing across twenty-four hours has fractured the frontier model market into two distinct routing paths. On April 23, OpenAI shipped GPT-5.5 at $30 per million output tokens, doubling the rate of its predecessor while claiming an 82.7 percent score on Terminal-Bench 2.0. The next day, DeepSeek released the open-weightAn artificial intelligence model whose trained parameters (weights) are publicly released, allowing anyone to download, run, and modify the model locally, even if the underlying training data remains private. V4-Pro, hitting 80.6 percent on SWE-bench, at $3.48 per million output tokens. The comfortable middle tier that most coding agents previously defaulted to has vanished.
The gap is structural, not promotional. OpenAI is pricing an integrated enterprise product, where the model, agent harness, and expanded computer use are bundled under a single safety review and billing line. The $30 output rate is designed to fund the next training run without diluting the premium positioning of a closed, single-vendor ecosystem. DeepSeek, conversely, is pricing text intelligence as a commodity. V4-Pro achieves its economics through a Mixture-of-Experts architecture that activates only 49 billion of its 1.6 trillion parameters per token. By combining compressed sparse attention with a reduced key-value cache, the model drops the inferenceThe process of running live data through a trained artificial intelligence model to generate an output or prediction. It is the operational phase that follows a model's initial training. compute required to serve a one-million-token context windowThe maximum amount of text, audio, or image data a model can hold in its working memory at one time to inform its next output. to a fraction of the frontier baseline.
This architectural efficiency enables a secondary hardware decoupling. While the OpenAI stack remains bound to NVIDIA silicon, DeepSeek’s V4 models launched with full inferenceThe process of running live data through a trained artificial intelligence model to generate an output or prediction. It is the operational phase that follows a model's initial training. support for Huawei’s Ascend supernodes. The V4-Flash variant, a smaller 284-billion parameter model priced at $0.28 per million output tokens, was reportedly trained partially on the Ascend stack. High-end open-weightAn artificial intelligence model whose trained parameters (weights) are publicly released, allowing anyone to download, run, and modify the model locally, even if the underlying training data remains private. inference can now be adapted to non-Western foundries, a shift that drove Chinese contract manufacturers SMIC and Hua Hong Semiconductor up ten and fifteen percent respectively in Hong Kong trading.
The winners are the infrastructure providers and mid-size development teams who can now run near-frontier intelligence on local multi-GPU clusters using the permissive MIT-licensed V4 weights. The losers are the mid-tier model providers attempting to sell proprietary API access in the $10–$15 range, a tier that is now squeezed entirely between OpenAI’s enterprise bundle and DeepSeek’s open infrastructure. Any vendor selling intelligence without an integrated application layer is now competing against an open weight that costs less than four dollars per million tokens.
What this forecloses is the assumption of a smooth price-performance curve where developers simply pick a point on the slope. What it opens is a bifurcated architecture for agentic systems—an expensive, closed loop for the final enterprise action, and a vastly cheaper open-weightAn artificial intelligence model whose trained parameters (weights) are publicly released, allowing anyone to download, run, and modify the model locally, even if the underlying training data remains private. routing layer for the thousands of background reasoning steps required to get there.
