What is reasoning?

8 min read

┌──────────────────────────────────────────────────────────┐
│  ═══════════════════════════════════════════════════     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  █████████████████████████████████░░░░░░░░░░░░░░░░░░     │
│  ██████████████████████████████████████░░░░░░░░░░░░░     │
│  ████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ███████████████████████████████████████░░░░░░░░░░░░     │
└──────────────────────────────────────────────────────────┘

Reasoning in AI refers to models that "think" through problems step by step before producing a final answer. Instead of immediately generating a response, reasoning models spend extra time working through the logic, considering different approaches, and checking their work. The result is significantly better performance on complex tasks.

Chain-of-Thought: The Foundation

────────────────────────────────────────

The idea behind reasoning models starts with chain-of-thought prompting. Researchers discovered that when you ask a model to "think step by step," it performs much better on math, logic, and analysis problems. Instead of jumping to an answer, the model writes out intermediate steps, which helps it arrive at correct conclusions.

For example, if you ask "What is 23 times 47?" a standard model might get it wrong by guessing. A model using chain-of-thought will write out the multiplication steps and arrive at 1,081 correctly. This same principle scales to much harder problems.

How Reasoning Models Work

────────────────────────────────────────

Modern reasoning models take chain-of-thought to another level. They are specifically trained to generate [reasoning tokens], which are internal thinking steps that the model produces before its final answer.

Here is how the process works:

▸[You send a prompt] with your question or task
▸[The model generates reasoning tokens] where it thinks through the problem, considers approaches, checks for errors, and refines its thinking
▸[The model produces the final answer] based on its reasoning

The reasoning tokens may or may not be visible to you depending on the provider and model. Some providers show them, others hide them but still charge for them. Either way, they contribute to the model's ability to solve harder problems.

Reasoning Models Across Providers

────────────────────────────────────────

[OpenAI's o-series models] (o1, o3, o3-mini, o4-mini) were among the first dedicated reasoning models. They generate internal chains of thought and are particularly strong at math, science, and coding. The reasoning tokens are hidden from the user but reflected in usage costs. OpenAI also offers "reasoning effort" controls that let you adjust how much thinking the model does.

[Anthropic's extended thinking] allows Claude to think through problems in a visible thinking block before responding. You can set a thinking budget that controls how many tokens Claude spends on reasoning. The thinking is shown to the user, providing transparency into how the model arrived at its answer.

[Google's Gemini thinking mode] adds similar step-by-step reasoning to Gemini models. It provides a thinking process that is visible in the response, and Google has integrated it across their model lineup.

[Open-source reasoning models] have made significant progress. [DeepSeek-R1] demonstrated that open models can achieve reasoning performance competitive with proprietary options. [QwQ] from Alibaba and other community efforts continue to push the boundaries of what open reasoning models can do.

When Reasoning Models Outperform Standard Models

────────────────────────────────────────

Reasoning models are not always better. They excel in specific situations:

▸[Math and quantitative problems]: Multi-step calculations, proofs, and numerical analysis
▸[Coding challenges]: Complex algorithms, debugging, and system design
▸[Scientific reasoning]: Analyzing data, forming hypotheses, evaluating evidence
▸[Logic puzzles]: Constraint satisfaction, deductive reasoning
▸[Complex analysis]: Tasks requiring weighing multiple factors and making nuanced judgments
▸[Planning]: Breaking down complex goals into actionable steps

For simpler tasks like summarization, translation, or casual conversation, standard models are usually just as good and faster.

Cost and Latency Tradeoffs

────────────────────────────────────────

Reasoning comes at a cost. Reasoning tokens take time to generate and count toward your token usage, so reasoning models are both slower and more expensive than standard models.

A standard model might respond in one to two seconds. A reasoning model working on a hard problem might take ten to thirty seconds or more. The token cost can be several times higher because you are paying for all those reasoning tokens in addition to the output.

This means you need to be strategic about when to use reasoning. For a quick Q&A chatbot, reasoning models are overkill. For a system that needs to solve complex coding problems or analyze financial data, the extra cost and latency are worth the improved accuracy.

Reasoning Effort and Budget Controls

────────────────────────────────────────

Most providers now offer ways to control how much reasoning a model does:

▸[OpenAI] offers a "reasoning effort" parameter (low, medium, high) that controls how many reasoning tokens the model generates
▸[Anthropic] lets you set a thinking budget in tokens, giving you fine-grained control over how much thinking Claude does
▸[Google] provides similar controls for Gemini's thinking mode

These controls let you balance accuracy against speed and cost. For easier questions, you can dial reasoning down. For critical or complex questions, you can give the model more room to think.

Reasoning and Accuracy

────────────────────────────────────────

There is a strong relationship between reasoning effort and accuracy on hard problems. Research consistently shows that:

▸More reasoning tokens lead to better performance on difficult tasks
▸The gains are most pronounced on problems that require multiple logical steps
▸There are diminishing returns: doubling reasoning tokens does not double accuracy
▸For easy problems, extra reasoning adds cost without improving results

This creates a practical optimization problem. The best approach is often to use a routing strategy: send simple requests to fast, standard models and route complex requests to reasoning models. Some systems use a smaller model to estimate difficulty and then choose the appropriate model automatically.

Use Cases in Practice

────────────────────────────────────────

[Software development]: Reasoning models can plan complex refactors, debug subtle issues, and architect systems by thinking through requirements and tradeoffs before writing code.

[Research and analysis]: When you need to evaluate competing evidence, reason about cause and effect, or synthesize findings from multiple sources, reasoning models produce more thoughtful and accurate results.

[Education]: Reasoning models can show their work, making them effective tutors. Students can see not just the answer but the thinking process that led to it.

[Decision support]: For business decisions involving multiple variables and tradeoffs, reasoning models can systematically evaluate options rather than jumping to conclusions.

The rise of reasoning models represents a significant step forward in AI capability. By giving models the ability to think before they respond, we get dramatically better results on the problems that matter most.

What is reasoning?

Chain-of-Thought: The Foundation

How Reasoning Models Work

Reasoning Models Across Providers

When Reasoning Models Outperform Standard Models

Cost and Latency Tradeoffs

Reasoning Effort and Budget Controls

Reasoning and Accuracy

Use Cases in Practice

What are AI agents?

What is prompt engineering?

What is deep learning?