Limitations of Large Language Models
What are the current limitations of LLMs enforced by limitations in our language?
A brief introduction to LLMs
Omnipresent in current media outlets is the rise of so-called intelligent AI systems, specifically Large Language Models (LLMs) like OpenAI’s GPT-4, Meta’s LLamA, or Google’s LaMDA. Large Language Models are advanced types of artificial intelligence models designed to understand and generate human-like text. They are trained on an extensive and diverse range of internet text, allowing them to predict the next word (token) in a sentence, thus enabling the capability to use human language.
The technological breakthrough of modern LLMs is due not only to an increase in computational power but also to the implementation of the transformer architecture. To understand what a transformer is and how it works, we first need to discuss underlying principles like the encoder-decoder framework and attention mechanisms.
Consider the task of translating a sentence like “Transformers are great!” into German. Traditionally, this problem has been tackled using an encoder-decoder system. The function of the encoder is to transform the data from the input sequence into a sequence of vector representations, often referred to as hidden states. These states are then passed to the decoder, which generates the output sequence. In the transformer architecture, both the encoder and decoder are composed of layers of transformers, allowing the model to capture complex patterns in sequences. The transformer model processes all words in the input at once thanks to its self-attention mechanism, giving it the ability to analyze the context of each word.
Earlier encoder-decoder models had a significant weakness before transformers came into play. The final hidden state of the encoder had to encapsulate the entire meaning of the input sequence because that was the only information the decoder had access to when generating the output. For longer sequences, this was especially challenging as it could lead to information loss in the process of compressing all details into a single representation.
The advent of the attention mechanism addressed this issue. The attention mechanism not only provides the decoder access to all of the encoder's hidden states but also assigns different weights to different parts of the input data. In this way, the model doesn't just "pay attention" to relevant parts—it assigns a degree of relevance or importance to each part, giving the decoder an enhanced ability to focus on the most important parts of the input when generating each part of the output.
It is important to mention that transformers can also operate in an encoder-only mode (like BERT) or decoder-only mode (like GPT), not just in the traditional encoder-decoder pairing. This versatility contributes to the wide applicability of transformers in various natural language processing tasks.
For those interested in a more detailed understanding of these concepts, further reading is recommended at the end of this post.
What are the current limitations?
LLMs are capable of a vast variety of tasks, concerning the nature of language, from rephrasing sentences to answering complex scientific questions or acting as a dictionary or encyclopedia1. But certainly, LLMs are not omnipotent and obey limits. Despite their computational limits like the size of their training set, their inference speed, or their maximum token context length, we want to explore which other limits exist. So in the following, we take a look at some hypotheses:
1. The limits of LLMs are dictated by the underlying limited nature of language
A common theme in the 19th and 20th centuries in philosophy and science involved the conviction that knowledge just is linguistic. In other words, understanding something was thought to mean thinking about the right words and seeing how these words connect to other true facts we know. Following this reasoning, the ultimate expression of language would be a wholly formal one, akin to a logical-mathematical model, consisting of random symbols bound by rigorous rules of deduction. However, everyday language could also be effective provided extra effort was made to eliminate any uncertainties or inaccuracies. A great advocate of this theory was Ludwig Wittgenstein, who wrote in his Tractatus logico-philosophicus: “The totality of true propositions is the whole of natural science”2.
Still, some contemporary intellectuals are convinced that all knowledge could be encompassed in an encyclopedia-like format. This mindset influenced early Symbolic AI work, where intelligence was viewed as the ability to manipulate symbols according to logical rules. In this context, an AI's knowledge was perceived as a vast database of true sentences, logically interlinked. An AI was considered intelligent if it produced the correct sentence at the appropriate time, reflecting proper symbol manipulation. This concept underpins the Turing test, asserting that a machine demonstrates understanding if it articulates correct responses at the right time, suggesting that knowledge is entirely encapsulated in knowing the correct phrases and when to use them.
This belief was subject to vast critique and discussed by thinkers like Gottfried Leibniz, Ned Block, and John Searle. Searle came up with the famous Chinese room argument. In a nutshell, his argument posits that even if a machine can convincingly simulate an understanding of a language (like Chinese), it does not necessarily possess a genuine understanding or consciousness. He argued that a person executing instructions to respond in Chinese doesn't truly understand the language, similar to how a computer doesn't comprehend the information it processes.
All forms of language, including programming, symbolic logic, or spoken language are based on a specific form of representation. This linguistic structure excels in detailing unique elements, their attributes, and their interrelationships using profound abstraction. Yet, there's a substantial difference between the cognitive effort of deciphering Chinese characters and the experience of understanding spoken Chinese, with an even larger gap when it comes to having the skill to speak it fluently.
Language, as a representational system, struggles to capture the nuances of intricate realities such as delineating irregular shapes, expressing the dynamism of object movement, or articulating the immediate emotional impact of a painting. Acknowledging these limitations, humanity has developed nonlinguistic representational schemes, like images, audiovisual recordings, graphs, and maps, that allow us to convey and comprehend such complex realities more effectively and intuitively.
As pointed out by the father of modern linguistics, Noam Chomsky, language is just not a clear and unambiguous vehicle for clear communication. We, humans, can compensate for this weakness by sharing a nonlinguistic understanding of the world. We grasp the meaning of a sentence by its underlying context and can infer what it is trying to say.
LLMs are constrained to convey information that lies within the bounds of linguistic representations of our language and are ultimately limited by the shallow understanding of contexts and the missing nonlinguistic understanding. Concluding this hypothesis proves to be true.
2. LLM’s strongest weakness is their nonlinguistic understanding
The fact that LLMs lack nonlinguistic understanding significantly limits their capabilities. Humans don't solely rely on language for communication and understanding; we utilize an array of other channels such as facial expressions, body language, and tone of voice. Moreover, our comprehension of the world is inherently tied to our embodied experiences - we understand what it feels like to be hot or cold, what the texture of an object feels like, or what a particular sound evokes emotionally. LLMs, however, don't have this experiential knowledge. They can't perceive or experience the world in the same way humans do. They can generate text about these concepts based on patterns in their training data, but they don't have any direct, nonlinguistic understanding of them.
This limitation becomes apparent when LLMs are tasked with understanding or generating content about embodied experiences or sensory perceptions. They might be able to generate plausible-sounding text based on patterns in their training data, but there's no experiential understanding or first-hand knowledge backing it up. So while LLMs can generate text on a wide range of topics, their lack of nonlinguistic understanding imposes a significant limit on the depth and accuracy of their output.
3. LLMs are engaged in a kind of mimicry
Mimicry, in this context, refers to the ability of an entity to imitate or replicate a particular behavior, skill, or function. In the case of LLMs, they mimic the ability to comprehend and generate human language based on patterns learned from training data.
However, this mimicry does not equate to understanding in the human sense. The intelligence in an LLM is formed based on patterns in the data it was trained on, but it lacks the contextual, experiential, and embodied understanding that humans possess. The model does not have beliefs, desires, or experiences, and it doesn't understand the world in the way a human does. It merely generates text that mimics patterns of human language it has learned.
4. Even with continual training from this point until the universe's eventual thermal extinction, a system that is only taught language will never reach the complexity of human intelligence.
The "scaling hypothesis" in AI suggests that with more data, more computing, and minor model tweaks, we can continue to significantly improve the performance of models. However, this does not imply that language models will ever reach the complexity of human intelligence. As we have seen human knowledge is inherently embodied and contextual, shaped by our experiences in the world, our physical interactions with it, and our shared cultural and societal norms. It involves not only linguistic understanding but also nonlinguistic forms of perception, conceptual understanding, and reasoning. Even if an LLM could perfectly mimic human language, it would still lack these other aspects of human intelligence.
LLMs have no stable body or abiding world to be sentient of—their knowledge begins and ends with words and their common sense is always skin-deep. Without the ability to perceive, experience, or interact with the world, their understanding will always be limited to the textual data they've been trained on.
Conclusions
In conclusion, while LLMs exhibit an impressive ability to generate human-like text, they are fundamentally different from human intelligence. They operate based on patterns learned from textual data, lacking the deeper, nonlinguistic, and embodied understanding that humans inherently possess. Their knowledge is encapsulated within the linguistic sphere, and their understanding of context and common sense is ultimately shallow.
The exploration of LLMs' limitations provides valuable insights into our continuing quest for artificial general intelligence. It underscores the importance of nonlinguistic understanding, embodiment, and the ability to interact with the world. LLMs represent a powerful tool for numerous applications, but their scope and capabilities should be understood and employed with an awareness of these intrinsic limitations.
Thank you for reading.
Sources:
https://www.noemamag.com/ai-and-the-limits-of-language by Jacob Browning and Yann LeCun
https://www.noemamag.com/the-model-is-the-message/
https://en.wikipedia.org/wiki/Chinese_room
https://academic.oup.com/book/32631/chapter-abstract/270515613?redirectedFrom=fulltext
https://arxiv.org/pdf/2305.08246.pdf
Further reading on transformers:
https://huggingface.co/learn/nlp-course/chapter1/4
https://machinelearningmastery.com/the-transformer-model/
https://www.oreilly.com/library/view/natural-language-processing/9781098136789/
Limited in some form
Tractatus Logico-philosophicus by Ludwig Wittgenstein, 4.11