Article·AI & Engineering·Oct 18, 2023
5 min read

Teaching AI to Spell: The Surprising Limits of LLMs

5 min read
Zian (Andy) Wang
By Zian (Andy) Wang
PublishedOct 18, 2023
UpdatedJun 27, 2024

Note that the character in the thumbnail above is an AI spelling bee. Alright, now let's get on with the article:

In the dazzling world of artificial intelligence (AI), the spotlight often lands on the latest AI marvels - large language models (LLMs) like Google's Bard and OpenAI's GPT-3. These text-generating titans have spurred a whirlwind of fascination and skepticism, given their ability to weave human-like text and, at times, their rather comical missteps. One such humorous incident recently put Google's Bard in the limelight, revealing an amusing and rather bizarre confusion over the number of 'e's in the word 'ketchup'.

🎓An 'E' for Effort

Imagine you're quizzing a friend on the spelling of 'ketchup'. "How many e's are there?" you ask. The answer seems straightforward, right? Now, imagine your friend, with a straight face, confidently gets it wrong. That's precisely what happened with Bard, Google's AI model. When asked about the number of 'e's in 'ketchup', Bard gave an erroneous response that had the internet in stitches.

Bard's confidence was amusing yet somewhat disconcerting, considering that LLMs that passed the Turing test couldn't answer a simple spelling question that even most kindergarteners could. Even more confounding was Bard's struggle to recognize its error, from stating that 'ketchup' has six letters to the bizarre claim that it lacks a second letter and thus has no e's. It took several attempts to correct Bard's understanding of the number of 'e's in 'ketchup', but the correct answer was eventually acknowledged.

Intrigued, I decided to test Bard's spelling skills with other words, wondering if a training mishap or misleading webpage led to the 'ketchup' confusion. Bard mostly performed well, although it occasionally struggled with longer words like "engineering". Energized by this, I turned to ChatGPT (GPT-3.5) and posed the same question. To my surprise, despite regenerating the response seven times, ChatGPT failed to count the 'e's in 'ketchup' each time!

While these incidents are amusing and serve as a reality check–we might be ahead of ourselves, perhaps the AI revolution is still far-fetched—they also highlight the inherent limitations of advanced AI models. Bard and its ilk are essentially sophisticated autocomplete systems (I recommend looking at the Transformers Circuit; they dive deep into how exactly models like LLMs produce their responses). Their responses aren't derived from a database of verified facts but are generated based on patterns observed in the text data they were trained on. This methodology can lead to factual blunders, as illustrated by the 'ketchup' incident and another instance where Bard erroneously claimed that the James Webb Space Telescope had taken the first images of a planet outside of our solar system.

Francois Chollet, a renowned software engineer and machine learning researcher, defines the intelligence of a system as "a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty." While LLMs can excel at tasks seen during training, their generalization abilities beyond writing articles or simple math could be better.

A simple prompt to Bard for an ASCII drawing of a circle or asking ChatGPT to play a chess game will quickly reveal the “stupidity” and stubbornness of LLMs. 

⚡Beyond the 'E' Incident

The bizarre yet confidently incorrect answers from LLMs can be partially attributed to their operational mechanism. They generate responses based on patterns in vast text data rather than querying a database of verified facts. Thus, their responses are probabilistic, not deterministic, which introduces an element of unpredictability—a trait we certainly don't want in real-world AI applications.

Moreover, despite efforts to broaden the scope of LLMs beyond pure text, they remain inherently language-based. Though plugins like Wolfram have been employed to expand their capabilities, allowing them to graph mathematical functions, LLMs still have inherent limitations. They lack an understanding of spatial concepts, such as the movement of three-dimensional bodies through space. This limitation becomes glaringly apparent when you ask ChatGPT for origami instructions or translate a description of a 3D animation to code. 

🔭 What’s In Store for the Future

The power of language is undeniable. We use it to share stories, communicate ideas, and shape our understanding of the world. Yet, language is only one facet of our multi-dimensional human experience. The way we perceive and interact with our surroundings extends beyond words. It involves sight, sound, touch, taste, and smell—all of which shape how our language is developed and used. 

Large language models, as advanced as they currently are, operate within the confines of textual data. Their learning system is fundamentally different from ours. While humans learn language through a rich sensory experience—listening to voices, observing expressions, and imitating others—these models learn from static text, devoid of the multi-sensory context that humans inherently have.

This distinction underscores why AI models, despite their impressive progress, remain a far cry from the sentient machines often depicted in science fiction. The AI of our imagination—capable of understanding and interacting with the world in a way that mimics human experience—remains a distant reality. For AI to reach such heights, it needs to be able to utilize more than just text. They must engage with the world using multiple senses, just as we do.

Imagine an AI that can learn a language not just from written text but also from hearing it spoken, seeing the facial expressions accompanying specific phrases, or even understanding the physical context in which words are used. The development of such multi-sensory AI could drastically improve their ability to understand and generate language, bringing us one step closer to the AI we envision in our sci-fi dreams.

Looking ahead, the future of AI lies in something other than LLMs but rather in a general-purpose model trained through Reinforcement Learning, one that can learn by itself, not through provided, guided data. 

In conclusion, the evolution of AI is an ongoing journey—one that involves more than just teaching a machine how to spell 'ketchup'. As we move forward, we must remember that our goal is not just to build models that mimic human language but to strive towards creating AI that can understand and interact with the world holistically. Until then, amusing incidents like the 'ketchup' conundrum will remind us how far we've come and how much further we have yet to go.

Note: If you like this content and would like to learn more, click here! If you want to see a completely comprehensive AI Glossary, click here.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.