Cloud taught enterprises that usage is easy to scale and hard to justify. Adoption grew. Bills grew. The connection between the two stayed unproven for years.
Agentic AI is repeating the pattern. Enterprises are tracking tokens, seats, API calls, and conversations because vendors put those numbers in front of them. None of them measure whether the agent makes money. A CFO looking at the AI bill in 2026 is reading a cost report, not a business case.
The fix is to measure what the agent produces, not what it consumes. What it produces is the unit. The unit is one completed workflow outcome that finance was already counting before the agent arrived. Resolved cases. Processed invoices. Reconciled trades. Qualified leads. Some are terminal, like a paid invoice. Others are intermediate, like a lead handed to a closer, and those count only when the conversion path is already measured. Tokens, sessions, and conversations fail this test. They are vendor inventions.
Once the unit is defined, the dashboard changes. Tokens become material input. Tool calls become process cost. Escalations become rework. Latency becomes cycle time. Human review becomes labor content. The agent stops being software. It becomes a production line.
That reclassification is the point. A seat license is an IT expense. An agent that completes workflows is per-transaction COGS. Once the cost sits at the transaction level, it has to clear the same bar every other margin lever clears. Unit economics.
Unit economics on an agent comes down to four numbers.
The first is incremental value per completed unit. The delta the agent creates over the current operating baseline. If the work was already getting done, the agent earns credit only for what changed: lower cost, faster cycle time, higher conversion, better retention, avoided headcount. Gross value is the wrong number. Most pilot business cases use it anyway, which is why most pilot business cases overstate value by two to five times.
The next is variable cost per completed unit, fully loaded. Inference, tool calls, retrieval, supervision, correction, escalation. Token cost is usually the smallest line. Human-in-the-loop time is usually the largest. Enterprises that price agents off the model bill alone are pricing the cheapest input and ignoring the most expensive one.
Then success rate. The share of attempts that produce an accepted unit. TheAgentCompany, a benchmark from Carnegie Mellon researchers and collaborators, reports that its most competitive tested agent completed 30 percent of tasks autonomously in a simulated software-company environment. Production environments will vary, but the math is the same. At 30 percent success, a $1 attempt costs $3.33 per completed unit before any rework. Failures do not disappear. They are absorbed by the successes. The formula is the one every CFO will eventually run:
True unit cost = (cost per attempt ÷ success rate) + rework cost on failures
Put numbers on a real workflow. An invoice-processing agent costs $0.80 per attempt in inference, retrieval, orchestration, and tools. Human review adds $1.20 per attempt. At a 40 percent completion rate, the math runs:
Cost per completed invoice = ($0.80 + $1.20) ÷ 0.40 = $5.00
If incremental value per invoice is $3.50, the agent is destroying margin at a rate of $1.50 per completed unit while automating work. Most enterprise dashboards in 2026 surface neither side of that equation.
Last is the marginal cost curve. Cost per unit as volume grows. Some agents get cheaper at scale through caching, routing, and fine-tuning. Others get more expensive as edge cases multiply and human review queues back up. Pilot economics at 1,000 units a month routinely break at 100,000. The curve does not appear during the pilot because volume has not bent it yet. It appears after the contract is signed.
These four numbers are the substance of the business case. Everything else is upstream or adjacent. Token spend is an input to variable cost. Resolution rate is success rate renamed. CSAT, latency, and compliance scores are guardrails that test whether the value and success numbers are honest. Most enterprise dashboards in 2026 show inputs and guardrails. They do not show the four numbers. Which means finance has the cost driver report and the operations efficiency report, but not the P&L.
Half-instrumented unit economics produces confident decisions on incomplete information. Optimize variable cost without measuring success rate, and the program scales the volume of failures. Celebrate a high success rate without pricing the value of a success, and the program funds agents that work technically and fail economically. Price the pilot beautifully without testing marginal cost, and the contract's worst year is its third. Each of these is sold as a measurement problem. Each one is a business case that was approved before the math was finished.
The discipline is sequencing. Define the unit before selecting the model. Price the incremental value before starting the pilot. Instrument the cost of failure, not just the cost of success. Test the marginal cost curve at three volumes, not one. Enterprises that follow this sequence in 2026 will still be running agents in 2027. The ones that skip it will spend the year explaining why the cloud bill rose and throughput did not.
The technology is not creating the measurement problem. It is exposing it. Long-running agents make every retry, every tool call, every escalation visible as a line item. Pilots in 2024 could hide cost in averages and quality in survey scores. Pilots in 2026 cannot. The agent itself is the audit.
The CFO conversation stops being about model selection. It becomes about four numbers. What does one completed workflow cost. What is it worth. How does the margin behave at scale. Whether the success rate justifies the spend. Enterprises that can answer all four will still be running agents in 2027. The ones that cannot are not running an agent strategy. They are running an expense line waiting to be cut.