Google Research publishes TurboQuant and Titans in the same week — compressing AI, and lengthening its memory
TurboQuant pushes vector-quantization compression into territory where serving costs fall without visible quality loss. Titans, paired with a framework called MIRAS, gives models persistent long-horizon memory.
The pair arrive as Anthropic and OpenAI race to lock down raw compute. Google's bet is softer: make the models cheaper and smarter per dollar, and the capacity question answers itself. Neither technique has been peer-reviewed, but both are already being folded into Gemini's production path, according to engineers familiar with the rollout.
For the past eighteen months, the dominant industry narrative has been capacity — gigawatts secured, fabs reserved, data-centre shells poured. TurboQuant and Titans are Google Research quietly reminding the market that the other axis, efficiency per token and usefulness per context window, has not stopped moving. TurboQuant is a vector-quantisation scheme that, per the paper's internal benchmarks, pushes compression ratios into territory at which serving costs fall without observable quality loss on standard evaluations. Titans, paired with a framework the team calls MIRAS, gives models a persistent memory layer that survives across sessions without retraining.
The specific numbers are where the competitive pressure lives. Google's write-up claims TurboQuant delivers a throughput improvement the team describes as approaching an order of magnitude on representative serving workloads, with quality regressions it characterises as within evaluation noise. Titans-plus-MIRAS, in an internal test the company has not yet externalised, holds coherent task state across what the researchers frame as weeks of simulated interaction. Both claims carry the usual caveats of unreviewed research; both are specific enough to re-price the category.
The winners are Google Cloud's margin line and anyone building on Gemini — enterprise customers who can now plausibly run agentic workloads at price points competitors will struggle to match this quarter. The losers are the inference-optimisation startups whose entire proposition was the gap TurboQuant appears to close, and the frontier labs whose moat was meant to be raw parameters rather than what those parameters remember.
What the week opens is the possibility that the next phase of the frontier race is not about who has more compute but about who needs less of it per unit of useful work. That is a different contest, with different winners, and it plays to a research culture Google has historically been better at funding than at shipping. Whether it ships this time is the question Mountain View has just picked a fight over.
