OpenRouter Annual Report: What Are People Doing With 100 Trillion Tokens?

Preface: The Turning Point for LLMs and the Overlooked Truth

2024 was a veritable turning point for Large Language Models (LLMs). On December 5th of that year, with the release of the first widely adopted reasoning model, o1, the AI field shifted decisively from a paradigm of simple single-pass pattern generation to multi-step deliberation inference.

This revolution accelerated the deployment and adoption of large models, but it also raised a question: amidst the frenetic pace of technological advancement, we lack sufficient empirical understanding of how these models are actually being used in the real world.

OpenRouter, as an AI inference service provider connecting hundreds of LLMs, analyzed over 100 trillion tokens of real-world LLM interactions. This analysis not only invalidates some common misconceptions but also points to six core trends that future model builders, developers, and infrastructure providers need to watch.

Surprise #1: Who is the Real "Traffic King"? Roleplay Beats Productivity

If you had to guess what users do most with AI, you might say writing code, composing emails, or summarizing text. But this 100 trillion token dataset gives a surprisingly different answer: the demand for Creative Roleplay far exceeds the "productivity tasks" many anticipated.

1. The Unexpected Leader: The Wild Growth of Roleplay

In the total token usage of all open-source large models, "Roleplay" has consistently held the dominant position, stabilizing at around 50% market share. This means users are primarily using open models for creative interactive dialogue, storytelling, roleplaying, and gaming scenarios.

This phenomenon highlights a unique advantage of open-source models: they can be used for creative applications and are typically less constrained by strict commercial safety or content moderation policies. Users view LLMs as structured roleplaying partners or "persona engines," rather than just casual chatbots. This finding heralds huge opportunities for AI in consumer applications, especially in interactive narrative, gaming, and virtual character fields.

Notably, while roleplay is the largest use case for open-source models, it is not their exclusive domain. By the end of 2025, roleplay traffic was split almost evenly between non-Chinese open-source models (43%) and proprietary models (42%), indicating that users now have viable options whether they choose open or closed models for creative chat and storytelling.

2. The Silent Infrastructure Builder: Information in Programming

Following closely behind roleplay, Programming Assistance is the second largest usage category for open-source models, accounting for about 15% to 20%. Many developers use open-source models for code generation and debugging.

When widening the view to all LLMs (including closed and open source), programming has become the fastest-growing and most dominant category. Programming-related queries accounted for about 11% of total tokens in early 2025, but in recent weeks, this proportion has exceeded 50%. This trend indicates LLMs are shifting from exploratory or conversational uses to application-oriented tasks like code generation, debugging, and data scripting.

In the programming domain, Anthropic's Claude series has consistently dominated, contributing over 60% of the spend in this category for most of the period. However, as LLMs are embedded into developer workflows, programming tasks have also become the main driver for the surge in context length. Requests involving code understanding and debugging often exceed 20K input tokens.

Surprise #2: The Open vs. Closed "Divide" and the "Mid-sized Rising Stars"

The LLM ecosystem is not a winner-take-all market but presents a "dual structure": Open Source (OSS) and Proprietary models coexist.

1. The 30% Golden Line and the Power of China

Although proprietary models (especially from major North American providers) still account for the lion's share of token usage, the share of open-source models is growing steadily, reaching about one-third (30%) of total token usage by the end of 2025.

Of particular note is the significant growth contributed by Chinese-developed open-source models. Their market share was negligible in late 2024 (only 1.2% weekly) but grew strongly in the second half of 2025, even reaching nearly 30% of total usage across all models in some weeks. Models like Qwen and DeepSeek have substantially reshaped the open-source market landscape through rapid iteration and dense release cycles, driving global competition.

2. DeepSeek's Decline and Market Fragmentation

OpenRouter data shows the LLM market is moving from consolidation to diversification. In late 2024, the DeepSeek family (V3 and R1) consistently held over half of OSS token usage, forming a near-monopoly.

However, after the "Summer Inflection Point" of 2025, this pattern was broken. Newcomers like Qwen, MiniMax's M2, MoonshotAI's Kimi K2, and OpenAI's GPT-OSS series rose rapidly, seizing significant market share. By late 2025, no single model could sustain more than 25% of OSS token share.

This shift indicates users no longer default to a "best" choice but seek value across a wider range of model options. For model builders, this means releasing a leading open model can garner immediate attention, but maintaining market share requires continuous development investment.

3. Mid-sized Models: Finding "Model-Market Fit"

Historically, the open-source market was polarized between "Small & Fast" and "Large & Powerful." Now, however, a new, growing category has emerged: Mid-sized Models (15B to 70B parameters).

Data shows that the overall usage share of small models (<15B parameters) is declining. The mid-sized market, conversely, clearly demonstrates a story of "market creation." This segment wasn't truly established until the release of Qwen2.5 Coder 32B in November 2024. The rise of mid-sized models (like Mistral Small 3 and GPT-OSS 20B) suggests users are finding a balance between capability and efficiency.

Surprise #3: From "One-Shot Answer" to "Agentic Action"

The usage of LLMs is undergoing a fundamental shift: from single-pass text generation to multi-step, tool-integrated, and reasoning-intensive workflows. This shift is termed the rise of Agentic Inference.

1. Reasoning Models Become the New Default

By 2025, the volume of tokens flowing through Inference-Optimized Models has risen sharply, now exceeding half of total usage. This reflects not only the release of higher-capability systems like GPT-5, Claude 4.5, and Gemini 3 but also increased user demand for models capable of managing task state, following multi-step logic, and supporting agentic workflows.

xAI's Grok Code Fast 1 currently holds the largest share of inference traffic, leading Google's Gemini 2.5 Pro and Flash. This trend suggests that reasoning-oriented models are becoming the default choice for practical workloads.

2. Prompts Explode 4x, Driven by Programming

Over the past year, both input (prompt) and output (completion) token volumes have increased significantly. The average input prompt tokens per request increased by about 4x, growing from ~1.5K to over 6K. Output tokens have also nearly doubled.

This growth indicates users are moving from open-ended generation to more complex, context-rich workloads. Models are increasingly acting as analysis engines rather than simple creative generators. The primary driver of this trend is programming workloads. Programming-related prompts are on average 3-4 times longer than general prompts. Longer sequences are not just users being wordy; they are a hallmark of embedded, more complex agent workflows.

3. Tool Calling: Models Learn to "Make Phone Calls"

Users are increasingly adopting tool-calling (function calling) capabilities. While the actual proportion of tokens for successful tool calls is stable at around 15%, models explicitly optimized for agentic reasoning, such as Anthropic's Claude series and OpenAI's gpt-4o-mini, dominated the tool-calling market early on.

The rising trend of tool calling compels model providers to improve tool handling capabilities, context support, and robustness for non-standard toolchains.

Surprise #4: Geopolitics and the Globalization of AI

LLM usage is not concentrated in North America but is becoming increasingly globalized and decentralized.

North America's share of spending has dropped to less than half of the total in most observation periods. Asia is expanding rapidly, not only as a producer of frontier models but also as a consumer. Asia's share of global spending has more than doubled, reaching about 31% in recent periods. The rise of Chinese LLM companies (like DeepSeek, Qwen, MoonshotAI) confirms that LLMs have become a truly global computational resource.

In terms of language distribution, English remains dominant (>80% token share). However, Simplified Chinese accounts for nearly 5% of global token volume, reflecting sustained engagement in bilingual or Chinese-first environments, especially against the backdrop of growing Chinese open-source models.

Surprise #5: The "Cinderella Effect" of Retention

In the rapidly evolving ecosystem of large models, the true measure of a model's moat is not short-term growth, but user retention.

The research introduces the Cinderella "Glass Slipper" Effect to describe a phenomenon of enduring retention. The hypothesis posits that in the fast-iterating AI market, there is a set of unsolved high-value workloads. When a new frontier model is released, it is effectively "tried on" against these pending problems. Once a new model perfectly matches previously unmet technical and economic constraints, it finds its precise fit—the "Glass Slipper."

For developers or organizations where the workload fits "just right," this match creates a powerful lock-in effect. Their systems, data pipelines, and user experiences become anchored to that model. Even if newer models appear subsequently, the incentive to re-platform diminishes drastically.

Retention Reveals Capability Inflection Points: For example, the May 2025 cohort for Claude 4 Sonnet and the June 2025 cohort for Gemini 2.5 Pro retained about 40% of users in their 5th month, far higher than later cohorts. This suggests these early cohorts corresponded to technical breakthroughs in "reasoning fidelity" or "tool use stability" that solved previously impossible workloads.
The Boomerang Effect: Additionally, DeepSeek's model charts show a rare "resurrection" jump. Some DeepSeek cohorts saw retention rise after initial churn. This suggests that some users who churned to try alternatives returned to DeepSeek, confirming it provided the best fit for their specific workloads due to its unique specialized performance or cost-efficiency.

Thus, retention is no longer just a result; it becomes a "fingerprint" for understanding breakthroughs in model capability.

Surprise #6: High Prices Don't Deter, But Cheapness Brings Scale

The LLM market performance is not fully commoditized: there is only a weak correlation between price and usage. Demand is relatively price-inelastic; a 10% price drop only increases usage by about 0.5% to 0.7%.

By plotting use cases by unit cost and total volume on a logarithmic scale, the market divides into four quadrants:

Quadrant	Characteristics	Key Categories	Insights
Mass-Market Volume Drivers	Low Cost, High Volume	Roleplay, Programming	Professional productivity (coding) and conversational entertainment (roleplay) are the two core drivers of AI volume. OSS models find significant advantage here.
Specialized Experts	High Cost, Low Volume	Finance, Academic, Health, Marketing	Users are willing to pay a premium for high accuracy in these high-stakes, niche fields.
Niche Utilities	Low Cost, Low Volume	Translation, Legal, Trivia	These functions are highly optimized or commoditized; "good enough" alternatives are cheap.
Premium Workloads	High Cost, High Volume	Tech, Science	Users are willing to pay for high performance and specialized capabilities. "Technology" as a use case costs much more but maintains high volume.

Closed models (like Anthropic's Claude 3.7 Sonnet) occupy the high-cost, high-usage "Premium Leaders" zone, while open-source models (like DeepSeek V3) dominate the low-cost, high-usage "Efficient Giants" zone.

This indicates that quality and capability often trump cost. If a model is significantly superior or possesses a trust advantage (like the Claude Sonnet series), users will bear higher costs because, in their workflows, API costs are negligible compared to the value of saved developer time. Meanwhile, falling costs bring a "Jevons Paradox" effect: extremely cheap models (like the Efficient Giants) are integrated into more places, ultimately consuming more total tokens.

Conclusion: A New AI Era from "Intuition" to "Data"

This empirical study based on OpenRouter corrects many "conventional wisdoms" about LLM usage. We see that LLMs are becoming a structurally diverse ecosystem, where future competition will be model-agnostic and heterogeneous.

The emergence of o1-class models shifts evaluation from single-pass benchmarks to process metrics and task success rates. The center of gravity for LLMs has shifted to "Systems Thinking" rather than "Single Bets," and to "Data Analysis" rather than "Intuition." We no longer just care about what a model can generate, but how it completes complex tasks through continuous reasoning, tool calling, and iterative refinement.

The next phase of AI competition will no longer be just about model scale, but a comprehensive contest of operational excellence, cultural adaptability, and multilingual capabilities. For all players, finding and "wearing" that "Glass Slipper" that solves high-value workloads early on is the key to determining long-term success.

OpenRouter Annual Report: What Are People Doing With 100 Trillion Tokens?

Table of Contents