Biography
Professor Janet Pierrehumbert has an interdisciplinary background from Harvard and MIT in linguistics, mathematics, and electrical engineering and computer science. Her PhD dissertation developed a model of English intonation that was applied to generate pitch contours in synthetic speech.
From 1982 to 1989, she was a Member of Technical Staff at AT&T Bell Laboratories in Linguistics and Artificial Intelligence Research. From there, Janet moved to Northwestern University, where she headed a research group that used experimental and computational methods to understand lexical systems in English and many other languages.
Janet joined the University of Oxford faculty in 2015 as Professor of Language Modelling in the Oxford e-Research Centre. She has held visiting appointments at Stanford, the Royal Institute of Technology, the École Normale Superieure, and the University of Canterbury.
She is a Member of the National Academy of Sciences, a Fellow of the American Academy of Arts and Sciences, a Fellow of the Cognitive Science Society and a Fellow of the Linguistic Society of America. She won the Medal for Scientific Achievement of the International Speech Communication Association (ISCA) in 2020.
Most Recent Publications
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Time machine GPT
Time machine GPT
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
Graph-enhanced large language models in asynchronous plan reasoning
Graph-enhanced large language models in asynchronous plan reasoning
Research Interests
Machine learning has made remarkable process in processing and generating human language.
Janet's research uses machine learning methods to understand language systems both in individuals and in communities. She is especially interested in how systems of word formation can be learned from statistical properties of the input, deployed to analyse novel words, and adapted to different contexts. She has worked on many diverse languages, including English, Arabic, Hindi, Turkish, Tagalog, and Zulu. Right now, Janet's main focus is on English, Finnish, German, and French.
People continue to be much better than computers at learning language and using it in novel ways. Janet is particularly interested in the research threads in natural language processing that aim to learn from human performance and build computer systems that incorporate key characteristics of human cognition:
- People do not use training data that was hand-labelled by adult experts. They can use incomplete or indirect information about the structures and meaning of words and sentences. Human learning is only semisupervised.
- People can form powerful generalisations from much less data than computers now require. For low-resource scenarios, like building systems for minority languages, engineers need the same ability.
- People adapt their language processing depending on the social situation and the topic of discussion. Contextualized methods aim to do the same.
In addition to working on computer modelling of language, Janet also undertakes experiments on real and artificial languages. These resemble computer games and are hosted on-line to obtain data from large numbers of people. Using the results, we can figure out the assumptions and biases that people bring to language learning.
Current Projects
The Wordovators project, in collaboration with the University of Canterbury, uses one and twoperson computer games to investigate the interaction of social and cognitive factors in language learning.
Most Recent Publications
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Time machine GPT
Time machine GPT
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
Graph-enhanced large language models in asynchronous plan reasoning
Graph-enhanced large language models in asynchronous plan reasoning
DPhil Opportunities
I am seeking DPhil students in natural language processing, and dynamics of communication.
Most Recent Publications
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings
Time machine GPT
Time machine GPT
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
STEntConv: predicting disagreement between Reddit users with stance detection and a signed graph convolutional network
Graph-enhanced large language models in asynchronous plan reasoning
Graph-enhanced large language models in asynchronous plan reasoning