Tokenmaxxing in the Anthropic Era: A New Opening for AI Startups

If AI can replace human work, corporate profit margins should improve.

This has been the big expectation around AI adoption over the past few years. Especially in software development, customer support, and back-office work, AI agents and AI coding tools were expected to reduce labor cost while raising productivity.

But recent news suggests this view may have been a bit too simple.

Labor cost may go down. On the other side, AI inference cost is starting to rise.

A symbolic example is Anthropic’s Claude Code.

According to Business Insider, Anthropic has raised its estimate of the average token cost per enterprise developer for Claude Code from $6 per day to $13 per day. The range that covers 90% of users has also been raised, from under $12 per day to under $30 per day. On a monthly basis, the estimate is about $150–250 per developer (Business Insider: Anthropic Doubles Estimate for Claude Code Token Spend).

Looking at the numbers alone, this may not seem like a big problem yet.

But when thousands of engineers across a company use it, and AI agents start to autonomously generate code, run tests, fix errors, and retry, the story changes.

AI cost does not stop at “X dollars per user per month” like traditional SaaS. The more it is used, the more cost piles up: tokens, API calls, context, retries, tool executions.

The word that captures this is Tokenmaxxing.

I do not see this only as a cost-increase story.

I think the wider Tokenmaxxing becomes, the more new room there is for AI startups.

The more seriously companies adopt AI, the more they need not only “a better model.” They need infrastructure to measure which model was used for which task, at what cost, and tied to what outcome.

In other words, the spread of Tokenmaxxing could become an important market opportunity for AI infrastructure startups working on model routing, AI agent monitoring, and cost-per-task measurement.

What is Tokenmaxxing

Tokenmaxxing is the situation where using more AI — especially more tokens — is itself praised as a sign of getting the most out of AI.

A token is the unit an LLM uses to process text, code, conversation history, tool results, and so on. Inputs, outputs, intermediate agent steps, long context, and retry logs all become cost.

What companies should really look at is not token consumption itself.

What they should look at is productivity in terms like the following.

What to look atContent
Which task it was used forCoding, summarization, classification, research, support, etc.
How much it costTokens, API, inference, tool execution cost
What outcome it producedAdopted code, resolved tickets, time saved
Whether it is reasonable vs. human workCost, quality, speed, risk comparison

In companies in early-stage AI adoption, this often gets reversed.

“Employees using AI are more productive.” “Employees using more tokens are more advanced in AI.” “The more agents you run, the more advanced you are.”

When this atmosphere takes hold, token consumption stops being an outcome metric and becomes an internal game score.

This is Goodhart’s law itself. When a measure becomes a target, the measure breaks.

In the past, using lines of code or commit count as a productivity metric caused developers to optimize for line count and commit count rather than real quality.

The 2026 version of this metric may be “token consumption.”

Company cases in recent news

Tokenmaxxing is not an abstract concept. At several companies, rising AI usage and AI cost are already becoming an issue.

CompanyReported developmentWhat to look at
Anthropic / Claude CodeRaised the assumed per-developer cost for Claude CodeThe more AI coding spreads, the more important inference-cost management becomes for customers
UberReported to have burned through its 2026 AI budget in a few months due to rising use of Claude Code, Cursor, and othersWhen AI tools are too convenient, usage outruns the budget model
AmazonReported that internal AI usage scores and leaderboards encouraged unnecessary AI useMaking AI usage too visible turns metrics into a game for employees
MetaA token consumption ranking called “Claudeonomics” was created, and over 60 trillion tokens were reportedly used in 30 daysToken use starts to look like a measure of “AI engagement”
SpotifyAfter layoffs, computing cost per employee reportedly rose because remaining staff use AI tools moreHeadcount cuts and AI cost increases happen at the same time
Shopify / RobloxRising use of AI assistants and developer AI features is reported to be pushing inference cost upThe more users use these features, the more LLM cost the provider bears

On Uber, AI Magazine reports that the company burned through its 2026 AI budget in a few months, with rising use of Claude Code and Cursor cited as the background (AI Magazine: Why Uber has Already Burned Through its AI Budget).

On Amazon, it is reported that some employees used an internal AI tool called “MeshClaw” and assigned unnecessary tasks to AI agents to inflate their usage scores. The article also describes a goal that more than 80% of developers should use AI on a weekly basis, along with a token-consumption leaderboard, which unintentionally encouraged competitive behavior (Financial Times: Amazon staff use AI tool for unnecessary tasks to inflate usage scores).

At Meta, a ranking called “Claudeonomics”, reportedly created by an employee, made token usage by about 85,000 employees visible, and over 60 trillion tokens were used in 30 days (Fortune: A Meta employee created a dashboard so coworkers can compete to be the company’s No. 1 AI token user).

For Spotify, Shopify, Roblox, and others, The Information reports that while AI is reducing labor cost, AI tool and LLM inference cost are starting to put pressure on profit margins (The Information: Tech’s AI Margin Math Is Getting Messier).

The point here is not to read these as simple “AI cost failures.”

It is more useful to read them as the next infrastructure layer becoming visible, now that companies are actually using AI in production.

The core: replacing labor with compute resources

AI adoption is not just cost reduction.

What AI reduces is human work hours. What rises in return is the cost of compute resources: GPUs, APIs, tokens, storage, logs, monitoring, security, and evaluation.

In other words, the cost structure of companies is shifting like this.

BeforeAfter AI adoption
Labor costAI inference cost
SaaS monthly licenseToken usage billing
Human work hoursAgent execution time
Manager progress checksLogs, traces, evaluation, monitoring
Outsourcing feesModel usage fees, API costs, cloud costs

This is not just efficiency improvement. The cost structure of companies is moving from labor-intensive to compute-intensive.

In that sense, the metric companies should watch is not “how much AI was used.”

What they should watch is cost per task.

This is where the room for AI startups opens up.

It is not easy for a company to analyze every AI usage in detail on its own, measure cost-effectiveness per model, monitor agent behavior, and route work to the optimal model.

I think this “operational management of AI usage” is the next startup area.

The next metric is cost per task

Cost per task is the idea of looking at how much AI cost it takes to complete a single task.

For example, you do not need a Claude Opus class expensive model to reply to an email. For short summaries, classification, tagging, and template generation, a small or low-cost model is often enough.

On the other hand, for complex code edits, legal documents, M&A models, medical records, and financial risk analysis, you need higher-end models and monitoring.

For the same kind of AI usage, the optimal model differs by task.

TaskSuitable model / setup
Simple classificationSmall model
Short summaryLow-cost model
Internal document searchRAG + mid-tier model
Code editsHigh-end model
Legal / medical / financeHigh-end model + monitoring
Routine monitoring agentsLightweight model
Final decisionsHigh-end model + human review

The point is not to use the highest-end model for everything. It is to use expensive models only where they are needed.

The effect of AI adoption is going to be measured by cost-effectiveness per task, not by token consumption.

A system that measures and improves cost per task becomes new AI infrastructure for companies.

Model routing as a startup opening

In this flow, model routing becomes important.

Model routing is the practice of not relying on one AI model for everything, but using multiple models depending on task difficulty, required accuracy, speed, cost, and risk.

For example, decisions like the following can be automated.

“Does this task really need Claude Code?” “Is a GPT-class model enough for this process?” “Would an open-source model like Llama or Mistral be sufficient?” “Couldn’t expensive models be used only for final verification?”

One representative example is Martian.

Accenture announced an investment in Martian in 2024, describing Martian as a company that dynamically routes queries to large language models and delivers more effective AI systems to enterprises. Martian itself describes its model router as a system that dynamically selects the best AI model for each query, optimizing performance, cost, uptime, and other business requirements (Accenture Newsroom: Accenture Invests in Martian to Bring Dynamic Routing of Large Language Queries, Martian: Partners with Accenture, Launches Airlock Compliance for Enterprises).

Just as companies stopped thinking directly about physical servers in the cloud era, in the AI era, instead of humans deciding “which model should we use” each time, a router may end up selecting the optimal model for each task.

Why big AI labs find this hard to take on seriously

What makes this area interesting is that big AI labs are structurally in a hard position to take it on seriously.

OpenAI, Anthropic, Google, and Microsoft want their own models and their own clouds to be used more. But a neutral model router will sometimes decide:

“This task does not need expensive Claude.” “For this process, a cheaper open-source model is enough, instead of GPT.” “Use Llama, not Gemini.” “High-end models are only needed for the final verification.”

This is rational for customer companies. But for big AI labs, it could cut into their own token revenue.

Of course, big AI labs will also push their own internal routing, lightweight models, caching, batching, and inference optimization.

But across companies, the work of neutrally judging “for that task, another company’s model is cheaper and good enough” is easier for a startup to build. That is where AI infrastructure startups have room to enter.

We are moving from an era where the model itself is the value, to an era where the value is in how you select, combine, monitor, and measure cost-effectiveness of models.

AI agent monitoring as another opening

Another important area is the AI agent monitoring layer.

AI agents do not just produce a final answer. They plan, search, call tools, retry on failure, evaluate intermediate results, and finally produce output.

The Tokenmaxxing problem happens along the way.

ProblemContent
Unnecessary model callsExpensive models used even for small decisions
Excessive retriesFailed processes repeated many times
Context that is too longUnnecessary conversation history and logs kept around
Wasted tool executionRepeated unnecessary searches and API calls
Disconnected from outcomesTokens used, but no outcome produced

A company worth watching here is Judgment Labs.

In May 2026, Judgment Labs announced that it had raised a combined $32M in seed and Series A, led by Lightspeed Venture Partners. The company is described as building the continuous improvement layer for AI agents (Business Wire: Judgment Labs Closes $32M in Seed and Series A Funding).

This is not just log management.

It is a layer that ties AI agent action logs to each company’s outcome metrics.

For customer support, the outcome metrics may be ticket resolution rate, repeat-contact rate, and customer satisfaction. For code generation, it may be PR acceptance rate, test pass rate, and post-fix bug rate. For sales support, it may be reply rate, opportunity conversion rate, and close rate.

Outcome definitions differ by company.

A monitoring layer like Judgment Labs ties these outcome metrics to AI agent action logs and tries to answer questions like:

At which step did it fail? Where were tokens wasted? Which model calls were unnecessary? Which processes did not lead to outcomes?

Once this becomes visible, companies can measure cost per task for the first time.

In other words, the Tokenmaxxing problem is creating not only model routing, but also AI agent monitoring as a new startup area.

What investors should look at

From an investor perspective, this theme is not just an AI tool cost issue.

It shows that the revenue structure of the AI era is starting to split into three layers.

LayerExamplesRevenue opportunity
Model providersAnthropic, OpenAI, GoogleRising token consumption leads to rising revenue
AI-adopting companiesUber, Spotify, Shopify, Roblox, etc.Balancing labor cost reduction against rising inference cost is the challenge
AI infrastructure optimizationMartian, Judgment Labs, etc.Cost management, model selection, monitoring, and evaluation become value

For Anthropic, the spread of Claude Code is a clear tailwind.

The more Claude Code grows, the more Anthropic’s revenue grows. But for customer companies, AI cost management becomes more important by the same amount.

In other words, Anthropic’s growth may also create demand for AI infrastructure companies like Martian and Judgment Labs.

This is similar to how the growth of AWS, Azure, and Google Cloud in the cloud market led to growth in surrounding layers like Datadog, Snowflake, Cloudflare, and FinOps tools.

The more model companies grow, the more the companies using them need cost management, monitoring, security, evaluation, and routing.

This is where AI startups have room to enter, in a way that big AI labs cannot easily match.

Another view

There is also another way to look at this.

Rising AI cost is not necessarily bad news. It can be read as evidence that AI tools are actually being used.

If, like at Uber, most engineers use AI tools and a large share of code is generated by AI, then it is natural for cost to rise in the short term.

New infrastructure always carries some waste in the early phase.

Cloud was the same at first. You could scale up quickly, but idle instances, excess storage, unnecessary logs, and wasted data transfer cost piled up.

Later, the cloud got FinOps, observability, security, and cost-optimization tools.

The same thing may simply be happening with AI.

In other words, Tokenmaxxing is not an AI adoption failure. It is an operational issue that only became visible because AI moved into production use.

And the fact that the operational issue has become visible means that a market is starting to form for the startups that will solve it.

Summary

Tokenmaxxing is a side effect of the early stage of AI adoption.

I do not see this only as a corporate cost management issue. I see it as a new entry opportunity for AI startups.

Big AI labs like Anthropic and OpenAI are basically built so that their revenue grows as their own model usage grows. For that reason, it is structurally hard for them to build a system that neutrally decides “this task does not need expensive Claude” or “for this process, a cheaper open-source model is enough instead of GPT.”

For companies adopting AI, on the other hand, throwing every task at the most expensive model is not rational.

Small models for simple classification. Low-cost models for short summaries. High-end models for complex code edits. High-end models with monitoring for high-risk areas like legal, medical, and finance.

Selecting the optimal model per task, monitoring agent behavior, and measuring AI cost per outcome — this kind of system becomes necessary.

This is where model routing companies like Martian and AI agent monitoring companies like Judgment Labs have room to enter.

Just as the spread of cloud created Datadog, Cloudflare, FinOps, and security companies, the spread of AI usage should also create neutral infrastructure companies that optimize model usage, not only companies that build models.

The core of Tokenmaxxing is, at the same time, both a corporate AI cost problem and a new market entrance for AI infrastructure startups.

Share this article

Join the conversation on LinkedIn — share your thoughts and comments.

Discuss on LinkedIn

Related Posts