4w ago·Northern Virginia·2 min read

OpenAI doubles frontier token pricing as DeepSeek open-weights a Mixture-of-Experts architecture at one-ninth the cost

The comfortable middle tier for coding agents has vanished, forcing developers to route between a $30 integrated enterprise bundle and a $3.48 commodity infrastructure layer.

By Emil Vossen · filed from Northern Virginia

A nine-to-one divergence in output pricing across twenty-four hours has fractured the frontier model market into two distinct routing paths. On April 23, OpenAI shipped GPT-5.5 at $30 per million output tokens, doubling the rate of its predecessor while claiming an 82.7 percent score on Terminal-Bench 2.0. The next day, DeepSeek released the open-weight V4-Pro, hitting 80.6 percent on SWE-bench, at $3.48 per million output tokens. The comfortable middle tier that most coding agents previously defaulted to has vanished.

The gap is structural, not promotional. OpenAI is pricing an integrated enterprise product, where the model, agent harness, and expanded computer use are bundled under a single safety review and billing line. The $30 output rate is designed to fund the next training run without diluting the premium positioning of a closed, single-vendor ecosystem. DeepSeek, conversely, is pricing text intelligence as a commodity. V4-Pro achieves its economics through a Mixture-of-Experts architecture that activates only 49 billion of its 1.6 trillion parameters per token. By combining compressed sparse attention with a reduced key-value cache, the model drops the inference compute required to serve a one-million-token context window to a fraction of the frontier baseline.

DeepSeek's Mixture-of-Experts architecture activates only 49 billion parameters per token.

This architectural efficiency enables a secondary hardware decoupling. While the OpenAI stack remains bound to NVIDIA silicon, DeepSeek’s V4 models launched with full inference support for Huawei’s Ascend supernodes. The V4-Flash variant, a smaller 284-billion parameter model priced at $0.28 per million output tokens, was reportedly trained partially on the Ascend stack. High-end open-weight inference can now be adapted to non-Western foundries, a shift that drove Chinese contract manufacturers SMIC and Hua Hong Semiconductor up ten and fifteen percent respectively in Hong Kong trading.

The winners are the infrastructure providers and mid-size development teams who can now run near-frontier intelligence on local multi-GPU clusters using the permissive MIT-licensed V4 weights. The losers are the mid-tier model providers attempting to sell proprietary API access in the $10–$15 range, a tier that is now squeezed entirely between OpenAI’s enterprise bundle and DeepSeek’s open infrastructure. Any vendor selling intelligence without an integrated application layer is now competing against an open weight that costs less than four dollars per million tokens.

What this forecloses is the assumption of a smooth price-performance curve where developers simply pick a point on the slope. What it opens is a bifurcated architecture for agentic systems—an expensive, closed loop for the final enterprise action, and a vastly cheaper open-weight routing layer for the thousands of background reasoning steps required to get there.

Sources (1)

https://thenewstack.io/disappearing-ai-middle-class/

filed by Emil Vossen · drawn from 1 source · April 26, 2026

Calibrate this dispatchtotal · 0 / 25

Drag along each spoke — center is 0, edge is 5