Skip to main content
Menu

Like humans, ChatGPT favours examples and ‘memories’ – not rules – to generate language

A new study led by researchers at the University of Oxford and the Allen Institute for AI (Ai2) has found that large language models (LLMs) – the AI systems behind chatbots like ChatGPT – generalize language patterns in a surprisingly human-like way: through analogy, rather than strict grammatical rules

The research challenges a widespread assumption about LLMs: that these learn how to generate language primarily by inferring rules from their training data. Instead, the models rely heavily on stored examples and draw analogies when dealing with unfamiliar words, much as people do.

To explore how LLMs generate language, the study compared judgments made by humans with those made by GPT-J (an open-source large language model developed by EleutherAI in 2021) on a very common word formation pattern in English, which turns adjectives into nouns by adding the suffix “-ness” or “-ity”. For instance happy becomes happiness, and available becomes availability.

The research team generated 200 made-up English adjectives that the LLM had never encountered before – words such as cormasive and friquish. GPT-J was asked to turn each one into a noun by choosing between -ness and -ity (for example, deciding between cormasivity and cormasiveness). The LLM’s responses were compared to the choices made by people, and to predictions made by two well-established cognitive models. One model generalises using rules, and another uses analogical reasoning based on similarity to stored examples.

The results revealed that the LLM’s behaviour resembled human analogical reasoning. Rather than using rules, it based its answers on similarities to real words it had “seen” during training – much as people do when thinking about new words. For instance, friquish is turned into friquishness on the basis of its similarity to words like selfish, whereas the outcome for cormasive is influenced by word pairs such as sensitive, sensitivity.

The study also found pervasive and subtle influences of how often word forms had appeared in the training data. The LLM’s responses on nearly 50,000 real English adjectives were probed, and its predictions matched the statistical patterns in its training data with striking precision. The LLM behaved as if it had formed a memory trace from every individual example of every word it has encountered during training. Drawing on these stored ‘memories’ to make linguistic decisions, it appeared to handle anything new by asking itself :“What does this remind me of?”

The study also revealed a key difference between how human beings and LLMs form analogies over examples. Humans acquire a mental dictionary – a mental store of all the word forms that they consider to be meaningful words in their language, regardless of how often they occur. They easily recognize that forms like friquish and cormasive are not words of English at this time. To deal with these potential neologisms, they make analogical generalizations based on the variety of known words in their mental dictionaries.

The LLMs, in contrast, generalize directly over all the specific instances of words in the training set, without unifying instances of the same word into a single dictionary entry. Senior author Janet Pierrehumbert, Professor of Language Modelling, Oxford e-Research Centre, said: “Although LLMs can generate language in a very impressive manner, it turns out that they do not think as abstractly as humans do. This probably contributes to the fact that their training requires so much more language data than humans need to learn a language.”

Co-lead author Dr. Valentin Hofman (Ai2 and University of Washington) said: “This study is a great example of synergy between Linguistics and AI as research areas. The findings give us a clearer picture of what’s going on inside LLMs when they generate language, and will support future advances in robust, efficient, and explainable AI.”

The study also involved researchers from LMU Munich and Carnegie Mellon University. The findings were published on 9 May in the journal PNAS.