The bill is coming due - AI coding vendor lock-in

Relying on subsidized token pricing is risky

Apr 28, 2026

If you’ve been using AI coding tools over the past two years, you’ve been getting a great deal. Frontier model access embedded in your IDE, powering your agents, running in your CI pipelines — for prices that don’t actually cover what it costs to serve you.

That’s not a bug. It’s a strategy. And strategies change.

This post is about what happens when the economics of AI-assisted coding get repriced, why that repricing is likely in the next 12 to 24 months, and what your engineering organization should be doing right now before the bill arrives.

The Free Ride Won’t Last

According to internal projections reported by the Wall Street Journal, OpenAI does not expect to reach profitability until 2030. Anthropic projects reaching positive free cash flow by 2027 or 2028. Both companies are growing revenue at extraordinary rates. Anthropic recently reported annualized revenue exceeding $30 billion. But revenue and profit are not the same thing, and right now these companies are very far from the same thing.

The structural problem is inference compute. OpenAI spent roughly 50% of its revenue on inference costs alone in recent years, with training costs pushing total expenditure well above what comes in. Every token you generate costs real money in GPU time. The pricing you see at the API does not reflect what it actually costs providers to serve those tokens.

Open-weight models of comparable capability are anywhere from 17x to 18x cheaper than Anthropic’s API pricing and are from providers who are covering their costs and making margins. That isn’t an indictment of Anthropic’s business model. It reflects real differences in model capability, trust, tooling maturity, and enterprise positioning.

The narrative you’ll hear most often is that inference costs will keep falling and everything will work out. That narrative has been repeated for three years. Inference costs for frontier models have not followed the curve that optimists projected, partly because each new model generation is larger and more capable than the last, which resets the compute baseline. Lower prices on last-generation models don’t help you if you need current-generation capability.

The current pricing environment is a competitive land-grab. It’s not sustainable.

The IPO Pressure Cooker

Both Anthropic and OpenAI are moving toward public markets. Anthropic has engaged IPO counsel and is reportedly discussing an offering as early as Q4 2026, targeting a raise exceeding $60 billion. OpenAI is targeting a similar timeline at a valuation approaching $1 trillion.

Private investors fund growth stories and tolerate long paths to profitability. Institutional fund managers running discounted cash flow models do not. The S-1 filing will contain actual unit economics for the first time. Analysts will model gross margins. Price-to-earnings ratios will matter in a way they don’t when you’re raising from VCs.

This creates a specific incentive. Both companies have strong motivation to show margin improvement before listing, not after. The levers available are cost reduction (hard, because compute costs are driven by usage and model scale) and price increases (a decision that can be made in an afternoon).

There’s also the Inference Trap: You build the best model, usage surges, inference compute explodes, and you face a forced choice between throttling users, raising prices, or cannibalizing the training compute you need to stay competitive. Anthropic experienced five major platform outages in a single month in early 2026. Claude Code users reported burning through usage allocations far faster than the pricing implied. That’s the Inference Trap operating in real time.

The combination of IPO pressure and Inference Trap dynamics makes a repricing event not just plausible but structurally likely. The question isn’t whether it happens. It’s whether you’re ready when it does.

What You’ve Actually Built On

Most engineering teams believe they have less AI vendor lock-in than they actually do. The assumption is: swap the API key, update the model name, done. There’s a lot more to it than that. It gets worse the deeper into agentic workflows you go.

The lock-in profile varies significantly by use case:

IDE-embedded tools like Copilot, Cursor, and Claude Code represent the shallowest lock-in. You could switch IDEs or model backends with little effort. But don’t underestimate soft stickiness. Developer muscle memory, .cursorrulescustomizations, team-shared system prompts, and workflow integrations all add switching friction. A price increase here hits developer productivity budgets, which are visible and politically sensitive.

Agentic coding workflows are where real lock-in begins. Agentic systems don’t just call a model — they build scaffolding around it. System prompts are tuned to a specific model’s personality and failure modes. Tool-calling schemas are optimized for how that model interprets them. Retry logic and output parsing are calibrated to observed behavior. When you switch models, that scaffolding doesn’t transfer cleanly. You’re not changing a config parameter. You’re running a re-evaluation campaign against your own codebase. Industry data suggests migration costs when provider lock-in forces a move average over $315,000 per project, and that figure reflects situations where teams already had some abstraction in place.

CI/CD and automated pipelines carry the highest risk. These are production systems with determinism requirements. Prompts optimized for one model may produce subtly different outputs on another. Those outputs can look similar enough to pass manual inspection but break downstream parsers and validation steps. Model version pinning provides a false sense of stability because providers deprecate models with 90 days’ notice, and there is no guarantee of behavioral equivalence between versions. The fundamental problem is that you cannot treat an LLM call in a production pipeline the same way you treat a deterministic function call. When you switch models, you have to prove the pipeline still works. You cannot assume it.

Open Source is a real option, but it has gaps

The obvious response to pricing risk is to use open source models, self-host, and pay for compute instead of markup. That path is more viable than it was 18 months ago but has real gaps that tend to be underestimated.

The capability gap has largely closed on many dimensions. Open models now match or surpass closed models on knowledge benchmarks, mathematical reasoning, and graduate-level science. The gap that remains is concentrated where it matters most for coding: production-level agentic tasks, multi-step software engineering, and complex tool use. On SWE-bench Verified, the most practically meaningful coding benchmark, the best open models are within a few points of frontier closed models. That gap is still an issue at the tail of task complexity.

The price differential is big. DeepSeek V3.2 is available at roughly $0.28 per million input tokens. Claude Opus 4.7 is $5.00 per million input tokens. That’s a 17x difference. For high-volume workloads the economics are compelling even accounting for operational overhead.

But here’s what the open source advocates undersell: switching models is not the same as switching model providers. The scaffold matters as much as the model. Real-world benchmarks show a 22-point swing on the same task with the same model when you change the agent scaffold and tooling. Switching models requires re-validating your entire system, not just verifying the model output looks reasonable.

The operational burden of self-hosting is a real cost transfer. Inference infrastructure, model serving with tools like vLLM or Text Generation Inference, GPU provisioning, update cadence, and security patching all fall on your team. For most organizations without dedicated ML infrastructure experience, this isn’t a savings. It’s a new operational surface area.

There’s also a geopolitical dimension worth naming directly. The strongest open models right now (DeepSeek, Qwen, Kimi) are Chinese-developed. For organizations with data sovereignty requirements, government contracts, or security-sensitive codebases, the lineage of a model matters. This isn’t a reason to dismiss these models outright, but it’s a factor that belongs in your architecture decision.

The Protocol Layer Is Your Best Friend

The most practical near-term lever against lock-in isn’t switching to open source. It’s building an architecture that makes switching possible.

Model Context Protocol (MCP) is the most significant structural development here. Originally developed by Anthropic and then donated to the Agentic AI Foundation (AAIF). This foundation was co-founded by Anthropic, Block, and OpenAI. MCP has achieved something rare: genuine cross-industry adoption. OpenAI abandoned their proprietary Assistants API and adopted MCP. Google DeepMind, Microsoft, and AWS are all on board. When direct competitors converge on a shared infrastructure standard it signals inevitability.

MCP decouples the agent-tool connection layer from the model layer. Your integrations with databases, APIs, filesystems, and external services are built once against the MCP standard and survive a model swap. That’s the right layer to standardize at.

Pair that with an LLM Gateway such as LiteLLM or Portkey, middleware that abstracts provider-specific API differences behind a single interface, and you get a system where the model backend is genuinely swappable without rebuilding your application logic. The marginal complexity cost of adding this abstraction early is low. The switching optionality it creates is high.

Be honest about what MCP doesn’t solve though. The protocol handles tool integration, not model behavior. When you swap models, your prompts still need re-validation. MCP can also consume 40-50% of available context window before any actual work begins, which creates real production tradeoffs. Standards help. They don’t eliminate the work.

What You Should Do Today

The cost of acting on this now is low. The cost of acting after a pricing shock is high.

For IDE tools: Evaluate whether your current tooling is model-agnostic or model-bundled. Prefer tools that let you swap backends. Baseline your developer productivity metrics now. You need a measurement baseline before any changes hit, not after.

For agentic workflows: Add an LLM Gateway from the start of any new project. Keep your agent orchestration layer architecturally separate from your model API calls. This is the single highest-leverage structural decision you can make. Build evaluation suites against your own codebase, not generic benchmarks. Generic benchmarks tell you how a model performs in the abstract. Your eval suite tells you whether you can safely swap models in your specific system.

For CI/CD pipelines: Treat every LLM call as a third-party dependency with explicit versioning, SLA monitoring, and a tested fallback path. Design for graceful degradation. What does the pipeline do when the model endpoint is slow, unavailable, or has been updated? This should be a documented decision, not an untested assumption.

Across all use cases: Audit your current AI spend and its concentration across providers. Most teams have no idea what this number is. Monitor the IPO timelines. The S-1 filings will be the first time the public sees actual unit economics from these companies, and they will move the conversation. Build internal familiarity with at least one open-weight model family. Even if you never deploy it in production, that knowledge reduces the information asymmetry in any future pricing negotiation.

A Strategy Note, Not a Panic Note

The goal here is not to abandon frontier models. They are genuinely better at certain tasks, the tooling ecosystem around them is more mature, and for many use cases the productivity gains justify whatever they end up costing.

The goal is not to be surprised. More specifically, the goal is to avoid being in the position of needing to move urgently with no alternatives evaluated and no time to build them.

Engineering organizations that have done the architecture work to reduce switching costs will have options when prices move. They’ll be able to make a deliberate choice between absorbing the increase, substituting a capable alternative, or negotiating from a position of real leverage. Organizations that haven’t done this work will face a different situation: urgent need, unknown switching cost, and a vendor who knows it.

The bill is coming. The amount is unknown. The only variable you control is how ready you are to pay someone else instead.

Steve Whittle

Discussion about this post

Ready for more?