I Think Therefore I am: No, LLMs Cannot Reason
Why “reasoning” models are not actually reasoning.
I recently spoke at an event in San Francisco, and during a fireside chat, the moderator asked me a series of rapid-fire questions expecting brief, single-sentence responses. One of these questions was, “Can LLMs reason?” My immediate answer was a concise, “No, they perform pseudo-reasoning.” Later at the event, I heard another speaker — a startup founder with a PhD in anthropology — respond to the same question with, “Yes, insofar as humans can reason.” While her perspective didn’t surprise me, I fundamentally disagreed. Rather than directly challenging her response at the moment, I chose to articulate my position clearly through this blog, detailing why “reasoning” models do not genuinely reason, though this doesn’t diminish their practical value.
Through experimentation with LLMs, researchers have developed various prompting techniques that significantly enhance model performance, such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT). These methods led to test-time (inference-time) scaling strategies used by models like OpenAI’s o1 and DeepSeek’s R1, create an impression that models engage in reasoning-like steps — often anthropomorphically described as “thinking” — to systematically decompose problems and generate more accurate results. However, this impression of reasoning is fundamentally an illusion.
The transformer architecture, which underpins these “reasoning” models, does not enable genuine reasoning. Reasoning is a complex cognitive process that involves causal understanding, mental modeling, intentionality, and abstraction — elements I will explore further in this post. Instead, transformer-based models statistically predict textual continuations based on patterns learned from extensive training data containing examples of human-written logical processes and problem decompositions.
This statistical pattern-matching should not be mistaken for authentic thinking or reasoning. Humans use text as a medium to externalize and clarify reasoning processes, but text alone does not capture the internal cognitive activities essential to true reasoning. A useful analogy here is comparing LLMs to parrots mimicking human speech. A parrot can replicate words and even produce contextually relevant phrases, yet it lacks any genuine understanding of language or the concepts behind those words. Similarly, LLMs produce outputs that superficially resemble human reasoning, but they are merely reproducing observed patterns rather than engaging in genuine cognitive reasoning.
When we examine what reasoning truly entails, several key elements are missing from LLMs:
- Causal understanding: True reasoning requires grasping causal relationships between events and concepts. LLMs recognize statistical correlations but don’t understand causation.
- Mental models: Humans reason by constructing and manipulating mental models of the world. While LLMs can simulate this textually, they don’t maintain persistent internal models that can be manipulated and tested.
- Intentionality: Reasoning is goal-directed and intentional. LLMs don’t have goals or intentions; they simply predict the most likely next token.
- Abstraction capacity: Human reasoning involves creating novel abstractions to solve unfamiliar problems. LLMs can work with abstractions present in their training data but struggle to generate truly new ones.
The confusion around whether LLMs can reason stems largely from anthropomorphizing their outputs. When we see text that resembles our own reasoning processes, we assume similar cognitive mechanisms must be at work. But this is a fundamental attribution error — attributing human-like cognition to what is essentially sophisticated pattern matching.
This doesn’t diminish the utility of LLMs. In fact, their ability to simulate reasoning-like processes makes them incredibly valuable tools. But conflating their performance with actual reasoning sets unrealistic expectations and obscures important limitations.
To further explore this topic we should understand what human reasoning is and how it compares to what I would call “pseudo-reasoning” performed by so-called “thinking” models.
1. Nature and Origin of Reasoning
René Descartes famously articulated the fundamental relationship between thinking and existence through the statement, “Cogito, ergo sum” — “I think, therefore I am.” According to Descartes, genuine reasoning is deeply tied to conscious thought and self-awareness, distinguishing authentic human cognition from mechanical responses or mere sensory impressions. True reasoning involves intentional thought, reflective introspection, and meaningful cognitive engagement, laying the foundation for human identity and our understanding of reality. This is in stark contrast to the autoregressive LLMs that operate purely on statistical prediction, generating text based solely on learned patterns from vast datasets. Unlike human reasoning, these models lack self-awareness, intentionality, and any meaningful comprehension of the concepts they output. Their impressive ability to mimic coherent and logical text does not imply genuine reasoning but rather represents sophisticated pattern-matching devoid of conscious thought or cognitive insight.
Human Reasoning:
- Causal and Intentional: Humans reason based on causal relationships, intentions, beliefs, and desires. Reasoning involves interpreting contexts, understanding motives, applying prior knowledge and experiences, and making value judgments.
- Cognitive and Conscious: Human reasoning often includes conscious reflection, introspection, self-awareness, and metacognition (thinking about one’s own thought processes).
- Semantic Understanding: Humans deeply comprehend meaning, context, and real-world implications of concepts.
LLM Reasoning:
- Associative and Predictive: LLMs perform reasoning by predicting statistically plausible continuations of text, based on patterns learned from vast amounts of training data. They do not possess intentions or causal understanding.
- Algorithmic and Automated: Reasoning in LLMs is purely computational, without consciousness or introspection. They do not understand the semantic implications of the words they generate.
- Pattern Matching and Generalization: LLMs excel at identifying patterns and generating coherent text, but their “reasoning” is effectively a sophisticated form of pattern completion, not true comprehension.
2. Mechanisms of Reasoning
Human Reasoning:
- Symbolic, Logical, and Abstract: Humans employ abstract thinking, logic, mathematics, symbolic representation, analogy, and conceptual reasoning.
- Heuristics and Intuition: Humans use intuitive judgments and heuristic shortcuts when making quick decisions or judgments under uncertainty.
- Flexible and Adaptive: Humans dynamically adapt reasoning strategies based on changing contexts, goals, or constraints.
LLM Reasoning:
- Statistical and Distributional: LLMs leverage learned probability distributions to generate plausible continuations based on textual contexts.
- Absence of Genuine Logic: They mimic logical patterns effectively but do not inherently employ formal logic or symbolic reasoning methods.
- Limited Adaptability: While highly flexible in pattern recognition, LLM reasoning is limited to the distributions and structures present in their training data.
3. Types and Examples of Reasoning
Human Reasoning (Examples):
- Deductive Reasoning: Using general premises to reach specific conclusions logically (e.g., “All birds fly; penguins are birds; thus penguins should fly — wait, I know penguins don’t fly, so the initial premise needs revision!”).
- Inductive Reasoning: Drawing general conclusions from specific observations (e.g., noticing multiple sunsets and concluding that the sun rises every morning).
- Causal Reasoning: Understanding that action A causes result B (e.g., “If I water this plant regularly, it will grow.”).
LLM Reasoning (Examples):
- Pattern Completion: Given “Paris is the capital of ___,” an LLM accurately predicts “France,” not through understanding geopolitics but through learned textual patterns.
- Analogy and Metaphor: LLMs can create analogies and metaphors by identifying statistical correlations between textual patterns, not from genuine conceptual understanding.
- Logical-seeming Responses: LLMs can simulate logical reasoning by correctly answering logical puzzles and math problems due to their learned associations, rather than explicit comprehension of logic itself. Mimicking the patterns of their training data which contain examples of reasoning steps taken to decompose problems.
4. Limitations and Shortcomings
Human Reasoning:
- Subjective and Biased: Influenced by cognitive biases, emotional states, limited memory, fatigue, or errors in judgment.
- Limited Processing Speed and Capacity: Humans are slower and less efficient in processing massive amounts of data or performing highly repetitive tasks compared to machines.
LLM Reasoning:
- Lack of True Understanding: Despite generating coherent text, LLMs lack genuine comprehension of meaning, context, emotions, or implications.
- Susceptibility to Hallucinations: LLMs may confidently produce plausible-sounding but factually incorrect or nonsensical responses (“hallucinations”).
- Inability to Reflect or Self-Correct Meaningfully: They cannot intentionally correct or validate their reasoning beyond statistical self-consistency.
5. Cognitive vs. Computational
Even though Large Language Models (LLMs) can appear to perform reasoning — especially when employing methods like chain-of-thought (CoT) — they are not engaging in true reasoning in the human sense. Instead, they’re generating plausible sequences of text based on learned statistical associations from massive datasets. Since CoT was one of the first paradigms to emerge and informed new training methods for “reasoning” models, we will use it as an example of why current prompting methods are not capable of reasoning.
6. What is Chain-of-Thought (CoT) in LLMs?
Chain-of-thought is a prompting technique where LLMs are explicitly asked to articulate intermediate steps in their responses. By prompting an LLM to “think step-by-step,” the model generates sequences of intermediate reasoning steps, significantly improving performance on tasks like mathematical problem-solving or logical puzzles.
Example of CoT prompting:
- Prompt: “Q: If a dozen apples cost $6, how much does each apple cost? Let’s think step-by-step.”
- LLM Response (CoT):
- “A dozen means 12 apples.”
- “$6 ÷ 12 apples = $0.50 per apple.”
- “Therefore, each apple costs $0.50.”
Although this seems like logical reasoning, what the LLM is actually doing is sequentially generating the next most statistically probable piece of text, learned from vast training data.
7. Why Chain-of-Thought is Not Actual Reasoning
Lack of True Understanding and Semantics
- Human reasoning involves semantic understanding of the meaning and implications of concepts. Humans genuinely understand the relationships between ideas, causal mechanisms, intentions, and context.
- LLMs, in contrast, rely solely on learned correlations in text. They do not understand meanings or implications. Their output is an artifact of statistical probabilities, not of conceptual understanding.
Absence of Intentionality or Goals
- Humans reason with explicit intentions, goals, and motivations. Reasoning is purposeful and directed towards problem-solving or decision-making outcomes.
- LLMs have no explicit goals, intentions, or purposes. They simply predict statistically likely continuations based on patterns in their training data.
Purely Statistical and Associative Process
- Human reasoning can involve abstract logical structures, conscious reflection, analogy, and introspection.
- LLMs, however, do not use logic in a strict sense — they mimic logical-sounding structures learned from text examples. They cannot self-reflect or verify the validity of their reasoning internally beyond statistical consistency.
8. Why LLMs Appear to Reason: A Statistical Illusion
The reason CoT makes LLMs appear intelligent or logical is because their training data includes numerous examples of human-written logical steps. They become highly skilled at pattern-completion, replicating the “form” of reasoning from billions of examples. This creates an illusion that the model understands, even though it’s simply generating statistically likely sequences.
Example: Logical Illusion
- An LLM correctly answers: “If A > B and B > C, is A > C?” with “Yes,” because it statistically learned that this sequence (“transitive reasoning”) usually appears as “true” in training data — not because it explicitly understands transitivity.
9. Evidence of Limitations (Hallucinations and Failures)
A clear indication that LLMs are not genuinely reasoning is their susceptibility to “hallucinations,” confidently generating plausible-sounding yet entirely incorrect statements:
- Mathematical hallucination:
Prompt: “What’s 12345 × 6789?”
LLM might respond: “83,823,105” (incorrect but confidently produced).
These errors occur because LLMs rely on pattern recognition rather than semantic understanding, logical consistency, or mathematical accuracy.
10. The Fundamental Difference Summarized
Chain-of-thought prompting can dramatically improve an LLM’s performance by guiding it through steps that mimic human logical reasoning. However, it remains a statistical pattern completion process without genuine understanding, consciousness, or intentionality. The illusion of reasoning emerges from sophisticated pattern recognition — but it is fundamentally different from the semantic, intentional, and reflective reasoning capabilities of humans.
Understanding this distinction is essential for interpreting the outputs of LLMs accurately and effectively integrating their capabilities into broader decision-making workflows and cognitive tasks.
Anthropomorphizing LLMs is misguiding and irresponsible because it fosters misconceptions about their capabilities and limitations, potentially leading users to overestimate their understanding, reliability, and trustworthiness. Assigning human-like attributes — such as intentions, emotions, or genuine reasoning — to these statistical models obscures their true nature as sophisticated text-generation systems trained on vast amounts of data.
Instead, it is essential to approach LLMs with clarity about what they truly are: powerful, pattern-matching systems capable of producing remarkable and useful outputs. Their strengths lie in rapidly synthesizing and generating text from immense datasets, enabling applications such as information summarization, drafting communications, or even supporting creative workflows. However, their inherent limitations — such as a lack of true understanding, intentionality, and semantic awareness — mean that they should serve as complementary tools rather than standalone decision-makers.
Maintaining an accurate perspective about what LLMs can and cannot do ensures responsible, informed use, aligning expectations with reality. This clarity helps prevent harmful reliance on LLM outputs in situations demanding genuine comprehension, ethical judgment, or nuanced reasoning. By thoughtfully integrating LLMs into our workflows, recognizing them as sophisticated tools rather than reasoning beings, we maximize their practical utility while safeguarding against risks stemming from misunderstanding their capabilities.