The Cost and Compliance Trap

Plan for it now or pay for it later

Jun 01, 2026

In December 2025, the Google Cloud CTO office published a post-mortem on what they called the “reliability gap” in agentic AI systems. The core finding was not subtle: the move from deterministic software to probabilistic agentic workflows introduces failure modes that most teams only discover after architecture decisions are locked. The paper described multi-step agent chains operating without transaction coordination, creating non-atomic failures that can produce irreversible side effects and data corruption mid-operation.

That is not a model quality problem. It is a structural design problem. And it is expensive in two ways that most AI cost models don’t account for.

The bill arrives after the architect has left

LLM inference costs scale with volume and context length. Everyone knows this. Where it hets more complicated is when you chain agents together.

Every step in a multi-agent workflow triggers at least one model call and also, potentially, retries. Context lookups trigger more. Add a reflection layer, an orchestration agent, a validation pass, and your per-transaction cost can explode.

This comes about because the costs don’t start to mount until it gores into production and the builders have moved on. By the time you see this, it’s too late.

The vendors selling agent frameworks are not incentivized to surface this. Their pricing models are based on consumption. A platform that makes it trivially easy to chain five agents together is probably not going to ship a cost impact calculator so you know what you are getting into. You discover the problem at the quarterly cloud invoice review.

Right sizing is not hard to figure out; deterministic execution for stable, high-volume paths, and probabilistic reasoning reserved for genuinely ambiguous edge cases where it creates real value. Many teams reach for an agent because the demo was impressive, not because the task required reasoning under uncertainty.

Regulation does not grade on a curve

The EU AI Act imposes documentation, transparency, and auditability obligations on high-risk AI systems, with phased enforcement running through 2027. At it’s most basic level: a system subject to these rules must be able to reconstruct its decision logic. It must be able to answer, on demand, why a particular output was produced.

A live LLM agent in the execution loop can’t answer that question. The reasoning is not stored. The attention weights are not auditable. The path from input to output runs through billions of parameters and produces no human-readable trace of its logic.

A deterministic workflow that was generated by an AI can answer that question. The code exists. The execution path is traceable. The logic is inspectable.

This distinction matters beyond Europe. Any organization operating in healthcare, finance, insurance, or legal already faces existing sector regulation with equivalent requirements. HIPAA audit trails. SEC recordkeeping. Insurance claim adjudication documentation. These obligations don’t go away because you replaced a rules engine with an agent. They get stronger, because the regulator now wants to understand the AI component specifically.

The compliance teams asking these questions are not being obstructionist. They are pointing at a real architectural gap that the engineering team may not have modeled as a risk.

Trust is already declining

Augment Code’s 2025 developer survey found that trust in AI-generated code accuracy dropped from 40% to 29% in a single year. That is not a rounding error. Adoption grew over the same period, reaching 62% of developers, but confidence in the output fell sharply.

More developers are using AI. Fewer of them trust what it produces. Developers are shipping AI-assisted work while privately applying a larger mental discount to its reliability. That discount has a cost. It shows up as increased review time, more defensive testing, slower iteration on the critical paths where the stakes are highest.

Enterprise teams that have moved to hybrid architectures report a different pattern. When agents set goals and orchestrate tasks but critical computations run inside deterministic modules, the LLM surface area at runtime shrinks. The auditability problem gets smaller because the probabilistic component is doing less. Data shows teams using this approach achieving over 80% reduction in manual effort while maintaining the traceability their compliance functions require.

What the architecture decision actually is

Treating atomicity as an infrastructure requirement instead of a prompting challenge changes what you build. It means designing agent chains with transaction boundaries, rollback logic, and explicit failure modes before you write the first prompt. It means asking, for every step in the workflow, whether this step requires reasoning under uncertainty or whether it requires reliable execution of a known procedure.

Most enterprise AI architectures are not designed this way. They are grown from demos. The agent that worked in the proof of concept becomes the agent in production, with more context, higher volume, and a compliance audit scheduled for Q3.

The Google Cloud paper put it directly: non-atomic failure modes create irreversible side effects. That is an infrastructure problem. Changing the system prompt does not fix it.

The question I keep coming back to is whether the hybrid architecture pattern will be adopted proactively or reactively. Some teams will design for it from the start. Most will discover the need for it after the first incident that cannot be explained to a regulator, or the first quarter where inference costs exceed the value the system generated.

Steve Whittle

Discussion about this post

Ready for more?