The Pillars
Inform: cost visibility per team, workload, feature. Optimize: reduce waste without impairing outcomes. Operate: continuous improvement and accountability. Classic FinOps applied to AI.
The FinOps Foundation’s 2026 framework added an AI-specific domain covering token economics, model selection, and prompt-cache efficiency. Inform requires daily granularity, not monthly invoices — Anthropic, OpenAI, and Bedrock all expose usage APIs that feed cost dashboards in CloudHealth, Vantage, or Apptio. Optimize means routing requests to the cheapest model that meets quality bar — Haiku for classification, Sonnet for reasoning, Opus only when justified by eval scores. Operate makes cost a first-class deployment criterion alongside latency and accuracy.
Attribution
Every agent call tagged with team, feature, customer (where applicable). Aggregate cost visible at each level. “The AI budget” is useless; per-team, per-feature attribution enables decisions.
Implement with metadata headers on every model call: team, feature, customer_id, request_id, prompt_version. Anthropic’s metadata.user_id and OpenAI’s user parameter carry through to billing exports. For multi-tenant CRM AI, attribute to the end customer to enable showback or chargeback. Langfuse, Helicone, and Portkey all aggregate by tag automatically. Without tagging, you’ll know your monthly bill but not which feature drives it.
Unit Economics
Cost per resolution, per lead qualified, per case summarized. Compare to human cost equivalent and to revenue generated. When the unit economics don’t work, kill the feature or fix the cost.
Track three numbers per feature. Cost per successful outcome (resolution, qualification, summary). Cost as a percentage of the value generated (revenue influenced or labor hours saved). Trend over 30 days. Healthy AI features show declining cost per outcome as prompt-cache hit rates rise and model choices optimize. Features where cost per outcome rises are usually broken evals — the system is generating more output to compensate for falling quality.
cost_per_resolution = total_token_spend / contained_resolutions
target: < 25% of human-handled cost
red line: > 50% triggers review
Governance
Budget ceilings per team. Alerts approaching limits. Auto-throttling at hard limits. Monthly FinOps review. Without governance, “unlimited AI” becomes “unlimited bill.”
Implement soft limits at 50%, 75%, 90% of monthly budget routed to team Slack channels. Hard limits at 110% trigger automated rate-limiting via API gateway (Kong, Portkey, or vendor-native). Run a monthly FinOps review with engineering, product, and finance to reset budgets, kill underperforming features, and allocate headroom to growth bets. Treat AI spend like cloud spend — variable, optimizable, requiring discipline.
Common Failure Modes
Five recurring patterns. Forgetting to enable prompt caching, paying 5-10x for the same context. Using Opus for tasks Sonnet handles equally well. Letting verbose system prompts grow unchecked. Loading entire vector-DB results into context instead of top-k. Failing to set max_tokens, allowing runaway outputs.
What to Do This Week
Pull last month’s AI spend, divide by completed outcomes, and present cost per outcome to product owners alongside accuracy metrics.