A child blowing bubbles, but instead of bubbles it is colourful letters of the alphabet

Creativity and Cooperation in the Dynamics of the Lexicon

Language Modelling

Imagine you’re in a foreign country, trying to pick up a few words of the local language. You learn that a bolu is a vegetable, a leki is a fox, but a small fox is a lekiki. So what would be the word for a small vegetable?

Depending on the patterns you notice in your limited vocabulary, you might guess that the right word is boluki or bolulu.

Janet Pierrehumbert, Professor of Language Modelling at Oxford, is fascinated by the kind of patterns we notice and apply when learning and forming new words — whether in an artificial language like in the example above, or in any of the more than 6,500 known human languages.

“People have made a big deal of the human ability to make complex sentences,” Pierrehumbert says, “but really just as impressive are the enormous vocabularies of human languages. People can make and understand new words all the time. Even five-year-olds with relatively modest amounts of language experience acquire this kind of fantastic ability to make new complex words.”

People can make and understand new words all the time. Even five-year-olds with relatively modest amounts of language experience acquire this kind of fantastic ability

To study how we process or even create new words, Pierrehumbert and a team of grad students, postdocs, and other colleagues at Northwestern University, the New Zealand Institute of Language Brain and Behaviour, and the University of Oxford undertook a multi-year project to develop a series of online games that could be used to investigate how people learn unfamiliar languages — and how those languages can evolve.

Do you speak mechanical Turkish?

A few of these games are available online to play for free, but for the bulk of their research, Pierrehumbert’s team utilized Amazon’s Mechanical Turk, an online marketplace for human-completed tasks, to recruit subjects to play various versions of the games. The project worked to ensure that the players were compensated fairly for their time and effort, at a rate that at least equalled the minimum wage in Illinois, the home state of the lead institution. “We had a very a good rating among ‘Turkers’ for having interesting work and paying people fairly,” Pierrehumbert says. This satisfaction was a boon for the project’s researchers, who could get quality results quickly. “People were thrilled they would get their study working and the next day have data from 400 people,” Pierrehumbert says.

The artificial language adapted in response to the incorrect guesses made by users in previous rounds

One of the Mechanical Turk-based games featured an evolving artificial language. In the first round, the expressions in the language were created as the random outputs of an algorithm, but in subsequent iterations of the game, the artificial language adapted in response to the incorrect guesses made by users in previous rounds. “You do that ten times and when you get to the end now it turns out that the language is much more structured,” Pierrehumbert says. “Basically the error patterns are in a direction that creates a regular structure in the language.”

The evolving-language-game model was based on previous work conducted at the University of Edinburgh, but Pierrehumbert’s team was able to carry it out at larger scale and focus on understanding which semantic dimensions — things like the color, shape, or number of an object — people seemed to prioritize as they intuited the structure of the new words they encountered, which contained elements that changed based on the object’s color, shape, or number. The study found that people had a higher expectation that the part of the word related to shape should be consistent. As the language evolved, this preference was reinforced, so that by the ninth generation the language was very consistent in the parts of the words that pertained to shape (for instance, shen- always meant only “berry”), but the parts of the words that denoted color and number remained much more variable (-to might mean one or two objects, which might be either red or blue).

The Wordovators project has the goal of discovering the fundamental mechanisms that support the complexity of the lexicon in human languages

The social lives of words

Pierrehumbert’s team was also curious about the social dimensions of words. “Studies have shown show that the number of words that are actually shared by the whole population is rather small,” she says. “Most words are only used by some people. Is that just a matter of expertise or is it partly a matter of social identity?”

Pierrehumbert wondered whether subtle social cues would affect the way people learned new words, so the team set up a game whose goal was to see how players learned how to use diminutive forms in the language. To refer to an object as a small example of an object, players had to use a different ending that depended on some factor related to the image of the speaker the game used to teach them the original word. “If that factor was the gender, it was really quite learnable,” Pierrehumbert says. But if the key factor was the direction the speaker was facing, people never picked up on it. “Basically, it was impossible — people just don’t dream that that would be relevant,” Pierrehumert says.

As learners of a new language parse syntax and filter social cues, they seem to use an unconscious form of statistical analysis to sort out the “rules” of their new vocabulary and to make guesses at how to form unknown combinations of meaning. Alex Schumacher, one of Pierrehumbert’s doctoral students at Northwestern, carried out a series of experiments on how people learn and generalize verb forms, which produced one of the project’s most surprising results. In general, if people encounter a somewhat variable linguistic pattern, they are most apt to extend it to examples that are very similar to ones they already know. Extensions become less likely as the similarity to the known examples decreases. For example, on the model of “semi-solid,” we might coin “semi-stretchable,” but coinages like “semi-treaty” or “semi-both” are unlikely to impossible.

As learners of a new language parse syntax and filter social cues, they seem to use an unconscious form of statistical analysis to sort out the “rules” of their new vocabulary

To learn more about how people form abstract generalizations about verbal meanings, Schumacher created an animated game with made-up verbs of motion in two slightly different versions of an artificial language. One used the simple form of the verb when the verb was intransitive (as in “Sam walked”). It added a suffix when the scene showed an object being caused to move with the subject (as in “Sam walked the dog”); this construction would have a suffix on the verb in many of the world’s languages. In the other, this pattern was reversed. In the reversed condition, people extended the suffix to new dissimilar examples even more regularly than they used it on examples similar to the ones seen during training. In fact, many participants used the suffix more regularly on these dissimilar new examples than they had seen it overall during training. These results are a strong challenge to purely statistical models of language learning. The team is continuing to work on models of how cognitive biases interact with patterns of experience to shape the lexical system.

Maori for minimalists

Pierrehumbert’s project officially wound down at the end of 2017, but the game toolkit she developed along with designer-programmer Chun-Liang Chan and co-principal-investigator Jennifer Hay are still being adapted and used. Hay successfully applied for follow-up funding from the Marsden Foundation for a project at her home institution, the New Zealand Institute of Language, Brain and Behavior, to study an interesting problem in language contact: how much have English-speaking New Zealanders learned about the lexicon of Maori from their very low-level exposure to the language? Maori, the language of New Zealand’s original Polynesian settlers and their descendants, is one of the country’s official languages but is not widely spoken among non-Maori New Zealanders. Hay set up a game to have people guess whether a given word form was Maori or from a similar language, and has been able to show that even people who know only 100 words of Maori were nonetheless quite skilled at picking out the Maori words in the game. “That was a big shock — just how small a vocabulary you can have and still make really substantial progress on forming a lexical system,” Pierrehumbert says.

That was a big shock — just how small a vocabulary you can have and still make really substantial progress on forming a lexical system

Still curious?

Learn more about Janet Pierrehumbert’s current work and recent publications.

Play three sample games created for the project.

Read the published results of the group’s experiments in language evolution and social cues.

Case study reproduced courtesy of the John Templeton Foundation

Undergraduate

Postgraduate

Support for Schools & Young People