Why ChatGPT Struggles to Count 'R's in 'Strawberry' and What It Reveals About AI's Confident Mistakes

From Tuyetthe, the free encyclopedia of technology

Introduction

In the world of artificial intelligence, a seemingly simple task—counting letters—has become a classic benchmark for the quirks of large language models (LLMs). A well-known example involves ChatGPT frequently miscounting the number of times the letter “R” appears in the word “strawberry.” While this might seem like a trivial oversight, it highlights a deeper issue: the tendency of AI to produce confident, yet incorrect, answers. When OpenAI attempted to showcase improvements in this area, users were quick to point out other persistent errors. This article explores why such mistakes occur, what they reveal about the technology, and ongoing efforts to address them.

Why ChatGPT Struggles to Count 'R's in 'Strawberry' and What It Reveals About AI's Confident Mistakes
Source: 9to5google.com

The Strawberry Test and Its Significance

Why Counting Letters is Hard for AI

At first glance, counting the three ‘R’s in “strawberry” should be trivial for a human, but for a language model like ChatGPT, it’s a different story. LLMs process text not as individual characters but as tokens—chunks of words or subwords. For instance, “strawberry” might be split into tokens like “straw” and “berry,” losing the granularity needed for precise letter counts. Additionally, these models are trained to predict the next token based on probability, not to perform explicit logical or counting operations. When asked to count letters, the model often guesses based on patterns in its training data, leading to confident but wrong answers—sometimes stating two, three, or even four ‘R’s without hesitation.

OpenAI's Attempted Fix

In November 2023, OpenAI claimed that an updated version of ChatGPT had finally mastered the strawberry test. The company took a victory lap on social media, highlighting the improvement. However, the celebration was short-lived. Users who tested the new version quickly discovered that while the letter-counting issue might have been patched, other equally confident mistakes persisted—from incorrectly solving math problems to fabricating historical facts. This episode underscores a crucial point: fixing one narrow error doesn’t solve the broader problem of hallucination in AI.

Confident Mistakes: A Deeper Problem

Examples from the Community

In response to OpenAI’s announcement, users flooded social media with examples of other confident mistakes. One user asked ChatGPT to list the planets in the solar system, and it incorrectly inserted a fictional planet. Another requested a summary of a book that doesn’t exist, and ChatGPT provided a detailed yet entirely fabricated plot. These are not random errors—they are systematic failures tied to how the model generates responses. Because LLMs are designed to appear authoritative, they often produce plausible-sounding but incorrect information, especially in areas requiring precise recall or calculation.

  • Mathematical errors: ChatGPT sometimes struggles with basic arithmetic, such as adding two large numbers or solving simple equations, while confidently stating wrong answers.
  • Factual inaccuracies: The model may invent dates, names, or events, especially when asked about obscure topics
  • Logic puzzles: Tasks requiring step-by-step reasoning, like the classic “river crossing” puzzle, can lead to illogical but convincingly explained solutions.

Why AI Hallucinates

Hallucinations occur because LLMs lack true understanding. They are pattern-matching machines that generate text based on probabilities learned from vast datasets. When a query doesn’t match familiar patterns, the model resorts to its most likely completion, which can be incorrect. Additionally, the models are optimized for smooth, natural-sounding responses, not accuracy. The “confidence” is a byproduct of the generation process—every word is chosen to minimize surprise, even if it leads to a lie.

Why ChatGPT Struggles to Count 'R's in 'Strawberry' and What It Reveals About AI's Confident Mistakes
Source: 9to5google.com

OpenAI's Response and Improvements

OpenAI acknowledges the challenge of confident mistakes. In follow-up statements, they explained that improvements like the strawberry fix are part of a broader effort to enhance reliability. Techniques include:

  1. Chain-of-thought prompting: Encouraging the model to explicitly show its reasoning steps, which can reduce errors in logic and math.
  2. Fine-tuning on correction data: Training on more examples where errors are labeled, helping the model recognize when it might be wrong.
  3. External tools: Integrating calculators or search engines to fact-check outputs in real time.

Despite these efforts, the problem persists. As OpenAI CEO Sam Altman noted, “We are still in the early days of understanding how to make these models reliable.” The strawberry incident is a reminder that no single fix is a panacea. Users should always verify critical information obtained from AI.

Conclusion

The humble strawberry—with its three ‘R’s—has become an unlikely symbol of AI’s limitations. ChatGPT’s struggle to count them, and its confident delivery of incorrect answers, reveals a fundamental tension in large language models: they are built to sound smart, not to be right. While progress is being made, the episode shows that overcoming hallucination is a long-term endeavor. For now, we must treat AI’s confidently spoken words with a healthy dose of skepticism—even when it seems to have finally learned its ‘R’s.