The AI Subsidy Era is Over: Avoiding the Multi-Million Pound Mistakes of Microsoft and Uber
If unpredictable SaaS costs are a source of discomfort for the C-suite, AI costs are a legitimate crisis. There is a joke circulating in finance departments:
Cloud: "Surprise! I am going to blow your opex budgets."
AI: "Hold my beer."
Dark humour aside, there is a reality materialising regarding surprise AI costs, which is similar to the early discomfort about unexpected cloud costs. For example, Microsoft, a primary investor in the AI sector, recently cancelled internal Claude Code licenses for its Experiences + Devices division. This decision, reported by The Verge, is a direct response to token-based billing costs that are untenable even for a company with vast cloud resources.
Uber faces similar challenges. An internal memo from the CTO revealed the company exhausted its entire 2026 AI budget in just four months. The culprit is not a lack of utility, but the unchecked consumption of expensive tokens. As Techmeme notes, the "AI subsidy" is ending. Frontier model labs are passing the true cost of compute to the enterprise. They are moving on from absorbing losses to gain market share; again, an initial pattern we saw in the early day of cloud.
There are 'lessons learned' from Microsoft's experience, so let's go beyond the critique to see how the situation can be avoided practically for other organisations. Let's diagnose the situation, and then go through a prescription to prevent the situation for your organisation.
The End of the AI Subsidy
For the past eighteen months, enterprises have operated in a false economy. AI labs offered flat-rate experiments or heavily subsidised API credits to encourage adoption. This era is over. Anthropic, OpenAI, and Google have all raised effective prices within the last six months. It is a marker of the transition from experimental AI pricing to infrastructure economics, where hidden architectural tokens are outpacing standard price-drop discount.
American AI software prices are up between 20% and 37%, according to data from Digg. This increase is a logical consequence of the unit economics of inference. Replacing human labour at scale requires inference costs to collapse, yet the trend is currently moving in the opposite direction.

Why Token-Based Billing is an Architecture Problem
The seat-based licenses to usage-based billing transition is a fundamental architecture problem. In traditional SaaS, a CFO can budget for a fixed number of seats. In the agentic AI era, however, costs are tied to token consumption, which is inherently unpredictable. This is a transition from experimental AI pricing to infrastructure economics, where hidden architectural tokens are outpacing standard price-drop discount.
When every product decision or automated workflow routes through an AI layer, the volume of tokens is difficult to scope. This difficulty is compounded by Jevons Paradox. As AI tools become more efficient and easier to build, organisations run more of them. This leads to "agent duplication": running overlapping agents in parallel: which drives spending up dramatically.
GitHub's shift to usage-based billing, effective from June 2026, confirms this industry-wide transition from pilot AI pricing to infrastructure economics.. The goal for vendors is to align their revenue with their primary cost driver: compute. For the enterprise, this means the end of predictable budgeting.
Anthropic, OpenAI, and Google are experiencing rising effective AI costs partly due to agentic duplication and agentic loops. It is also due to hidden reasoning tokens, and increased output volumes, despite falling sticker prices. This "token pricing crisis" has resulted in enterprise budget pressures. It has prompted some organisations to roll back adoption of intensive tools like Claude Code, and this is what we see in the case of Microsoft.
The Technical Inevitability of Power and Hardware
Inference costs are largely a reflection of power and hardware expenses. Until there is a path for significantly cheaper electricity through, say, nuclear small modular reactors or other breakthroughs, the cost of running frontier models remains high.
Data centre cooling is one area where marginal gains are possible. Research published in MDPI indicates that AI-optimised cooling can save between 10% and 21% on energy costs. While these savings are real, they are insufficient to offset the increasing compute intensity of each new model generation.
The primary driver for future cost reduction is hardware efficiency. Platforms like the Nvidia Vera Rubin platform promise a 10x higher inference throughput per watt. However, these gains are often absorbed by the increased complexity of the models themselves. Hyperscaler capital expenditure continues to scale, ensuring the absolute compute bill stays high even as per-token costs fall.

Strategies for Cost Containment and Data Fluency
To avoid the mistakes of Microsoft and Uber, enterprise leaders need to transition from experimentation and AI pilots to strict governance. This transition requires a high level of data fluency and it involves some of the less flashy the ability to understand the relationship between data architecture, model selection, and the bottom line.
1. Internal Benchmarking and Model Routing
The assumption that a "frontier" model is required for every task is a costly error. Many organisations are discovering they can achieve 80% of the value at a reduced cost by using alternative models.
For example, DeepSeek and Qwen variants provide impressive results in agentic harnesses for a fraction of the price of GPT-4 or Claude 3.5. A measurement study shows that intelligent routing can lower costs for models like Qwen3-32B by as much as 37.8%.
2. Architectural Planning
If your AI strategy is based on "giving everyone a seat," you are already behind. Cost management must be built into the application architecture. This involves:
- Implementing token quotas at the team and project level.
- Using smaller, task-specific models for routine coding or data transformation.
- Monitoring for agent duplication to ensure multiple systems are not performing identical tasks.
3. Hardware Efficiency and Accelerators
As organisations scale their internal AI factories, the choice of hardware becomes a strategic differentiator. Groq-style accelerators and liquid-cooling rollouts are necessary for high-volume inference. These improvements are the only way to bend the cost curve back in favour of the enterprise.

A Pragmatic Path Forward
The numbers are no longer working for the unprepared. It is time to step back and make real improvements to your inference infrastructure.
The situation Microsoft recently navigated is a warning for all enterprise procurement departments. There are 'lessons learned' from Microsoft's experience, so let's go beyond the critique to see how the situation can be circumvented for other organisations. When the bill for a competitor's tool exceeds the perceived productivity gain, the license is cancelled. It is a failure of financial planning, which requires detailed financial planning and procurement agentic AI decisions as well as measurable and trustworthy AI KPIs for the CFO.
The way forward is through competition and efficiency, as well as the basics, such as asking the hard questions while others are focused on the superficial glam of AI. As international models improve, pricing pressure will increase. Until then, CFOs must treat AI as a high-consumption utility. AI is not as a standard software purchase; it is more reminiscent of the early cloud computing decisions.
Organisations that prioritise data fluency and architectural rigour will survive the end of the subsidy era. Those that rely on vendor subsidies will find themselves with evaporated budgets and unfinished projects.
If you are looking to build a sustainable AI strategy that avoids these pitfalls, Jen Stirrup Consulting is here to help. I specialise in helping organisations bridge the gap between technical potential and financial reality. Our focus is on strategic planning and avoiding the planning fallacy that often plagues large-scale AI implementations.
AI value is easy to demo, but cost is harder to contain. If you can’t trace usage to outcome, you have exposure. Fix the architecture before the bill forces you to. This is your Monday check: Do you know what your AI is costing you, and why? If AI usage doubled tomorrow, would value double - or just the bill?
One thing to fix this week: Remove or redesign one AI workflow where cost scales with usage but value doesn’t. That’s the problem to solve this week.


