Enterprise AI's Two Decisions: The Model, and the Architecture Around It

Generative AI Enterprise GenAI Generative AI ROI AI Orchestration Enterprise AI

Two decisions now define enterprise AI: the model you run, and the architecture that decides what it costs and what stays yours. The 2026 Corporate Buyers' Guide to Enterprise AI Platforms helps you make both.

In late June, Coinbase said it had nearly halved its AI bill even as its token usage kept climbing. It did not get there by capping engineers. It rewired what sits underneath them. An internal gateway now defaults routine work to cheaper open-weight models like GLM and Kimi, routes each request to the cheapest model that can do the job, and caches so aggressively that the hit rate on one internal tool jumped from 5% to 60%. Frontier models are still on hand for the hard problems. Most work no longer reaches them.

Harvey, the legal AI platform used by a large share of the Am Law 100, arrived at the same place from the opposite side. It went multi-model, routing drafting, research, and pre-trial work across OpenAI, Anthropic, and Google, after its own benchmarks showed that no single model wins every task. There is no longer one best model, the company said, only an array of strong ones suited to different jobs.

Different businesses, same conclusion. Choosing the model is one decision. Building the system around it is the other: the gateway, the routing, the caching, the fine-tuning, the contract that lets you move. That is where Coinbase found its savings and Harvey found its edge.

That is enterprise AI in 2026. Two decisions now decide the outcome: which models you run, and the architecture you run them on. The first you can research. The second you have to design, and it is where the cost, the contract, and the control live.

We evaluate these platforms for a living, and the pattern holds across the companies we advise. Teams pick a strong model, then lose the advantage in the parts they never negotiated or designed for.

The Bill You Cannot Forecast

Coinbase engineered its way out of the bill. Most companies are still inside it. Uber burned through its 2026 AI coding budget in four months. A healthcare enterprise ran up more than $6 million in unplanned annual token cost before finance had any visibility into it. Meta had no central view of its AI spend until usage crossed 73.7 trillion tokens in a single month. Priceline renewed one coding tool at four to five times the prior price. The common thread: pricing shifted from seats you can count to tokens you cannot, and usage compounds faster than unit prices fall, so the total climbs even as the price per token drops.

This is the market, not an accident of scale. Google now processes trillions of tokens a month, roughly seven times last year's volume, and Sundar Pichai told Google I/O that companies were already blowing through their annual budgets, and it was only May. Goldman Sachs projects consumption will rise roughly 24 times again by 2030.

The cruel part is which programs get cut. When an AI program works, usage and cost climb together, and that cost is the easiest line for finance to see. So the successful program draws the scrutiny while the quiet pilot survives. A spend cap helps, but only if you can see the meter move. That is a contract problem and an architecture problem at once.

The Contract That Governs a Moving Target

New models land almost every month, and the ones you built on get retired almost as fast. In early 2026, OpenAI pulled GPT-4o, GPT-4.1, and o4-mini from ChatGPT, then retired GPT-4.5 in June. Anthropic ran Claude Opus 3 through its first formal retirement in January and shut down Claude 3.7 Sonnet in May. Google retired Codestral and Mistral Large from Vertex in January. Support windows that used to run 18 to 24 months now run 6 to 12. A workflow tuned to one model can hit a dead endpoint before it reaches production.

You sign for two and three years anyway. Most enterprise agreements fix a price and a term, then let model behavior, pricing, and availability change on the vendor's schedule with no trigger to renegotiate. The contract is long and fixed. What it governs turns over every quarter. So the real question is what stays yours when the models change and the meter keeps moving.

What Compounds Is What You Own

Some of the stack is swappable by design. Foundation models, chat interfaces, token pricing, and raw compute all have credible substitutes, and their price falls as providers compete. Choose them well, and keep the freedom to change them.

The value sits one layer up, in what your system accumulates as it runs: the workflows you tuned over months, the evaluation sets built on your own operations, the orchestration layer that routes each task to the right model, the telemetry that improves the longer it runs, and the feedback loops where every correction sharpens the system. Keep those on your side and you can change the model beneath them without losing what you built. Let them live on the vendor's platform and your advantage compounds on their balance sheet instead of yours.

The cost of getting this wrong is concrete. One founder put the price of rebuilding an agent stack off its platform at more than $200,000 and three to four months of degraded capability. Block runs more than 60 of its own servers behind its Goose agent and reports up to 75% time saved on some tasks. JPMorgan did the enterprise version, putting roughly 250,000 employees on one governed platform it owns while buying the components underneath. The lesson is the sequence. Own the architecture, buy the parts.

The Window Closes at Scale

Your leverage is highest before your workflows depend on anyone. It peaks as you scale, then declines, because every production workflow adds a dependency across data, orchestration, security, and monitoring, and each one raises the cost of leaving. Most organizations do not learn what is portable until they try to move, which is the most expensive time to find out.

So the cheapest time to protect your position is now, before AI reaches production. Negotiate the right to switch. Design so changing models is a config change, not a rebuild. Hold your artifacts in portable formats. Watch the meter daily, not at renewal. After production, portability stops being a clause and becomes a rescue project.

The Through-Line

The advantage goes to whoever gets both decisions right, the model and the architecture that owns the execution layer, the cost controls, and the exit. The winners will own their AI economics, their contracts, and their artifacts. The captives will rent all three and call it a platform.

The model is the first decision. The architecture is the one that compounds.

This is what the 2026 Corporate Buyers' Guide to Enterprise AI Platforms is built to map: 32 vendors plotted on market readiness and strategic value across chatbots, foundation models, and agent infrastructure, with the pricing models, the cost traps, and the ownership questions to work through at each stage. The decisions are yours. The landscape is what we map. For more, visit gaiinsights.com/cbg-llms.

Why Asset Managers Cannot Afford to Move Slowly on GenAI

Jul 17, 2026 GAI Insights Team

GenAI is not replacing investment judgment. It is increasing the output of the expensive people who produce it.

At Blackstone, deal teams can move...

Why Asset Managers Cannot Afford to Move Slowly on GenAI

Read this Article

The Unit Economics of Agents: From Token Spend to Business Output

May 28, 2026 GAI Insights Team

Cloud taught enterprises that usage is easy to scale and hard to justify. Adoption grew. Bills grew. The connection between the two stayed unproven...

The Unit Economics of Agents: From Token Spend to Business Output

Read this Article

Per-Seat Is Not Dead. It Is No Longer Enough

May 21, 2026 GAI Insights Team

Salesforce helped make per-user SaaS subscriptions the default enterprise software buying motion for two decades. In 2026, Salesforce is selling...

Per-Seat Is Not Dead. It Is No Longer Enough

Read this Article

Enterprise AI's Two Decisions: The Model, and the Architecture Around It

The Bill You Cannot Forecast

The Contract That Governs a Moving Target

What Compounds Is What You Own

The Window Closes at Scale

The Through-Line

The model is the first decision. The architecture is the one that compounds.

Why Asset Managers Cannot Afford to Move Slowly on GenAI

The Unit Economics of Agents: From Token Spend to Business Output

Per-Seat Is Not Dead. It Is No Longer Enough

GAI World
2026

Catch up with our AI Analysts

Adam rappaport

Enterprise AI's Two Decisions: The Model, and the Architecture Around It

The Bill You Cannot Forecast

The Contract That Governs a Moving Target

What Compounds Is What You Own

The Window Closes at Scale

The Through-Line

The model is the first decision. The architecture is the one that compounds.

Why Asset Managers Cannot Afford to Move Slowly on GenAI

The Unit Economics of Agents: From Token Spend to Business Output

Per-Seat Is Not Dead. It Is No Longer Enough

Subscribe to Our Daily Briefing

GAI World 2026

Catch up with our AI Analysts

Adam rappaport

GAI World
2026