Skip to main content
Menu
Professor Janet Pierrehumbert Director of Equality and Diversity

Professor

Janet B. Pierrehumbert BA PhD

Professor of Language Modelling

Senior Research Fellow and member of Governing Body, Trinity College

Faculty of Linguistics, Philology and Phonetics

TEL: 01865 610617

Biography

Professor Janet Pierrehumbert has an interdisciplinary background from Harvard and MIT in linguistics, mathematics, and electrical engineering and computer science. Her PhD dissertation developed a model of English intonation that was applied to generate pitch contours in synthetic speech.

From 1982 to 1989, she was a Member of Technical Staff at AT&T Bell Laboratories in Linguistics and Artificial Intelligence Research. From there, Janet moved to Northwestern University, where she headed a research group that used experimental and computational methods to understand lexical systems in English and many other languages.

Janet joined the University of Oxford faculty in 2015 as Professor of Language Modelling in the Oxford e-Research Centre. She has held visiting appointments at Stanford, the Royal Institute of Technology, the École Normale Superieure, and the University of Canterbury.

She is a Member of the National Academy of Sciences, a Fellow of the American Academy of Arts and Sciences, a Fellow of the Cognitive Science Society and a Fellow of the Linguistic Society of America. She won the Medal for Scientific Achievement of the International Speech Communication Association (ISCA) in 2020.

Personal website

Most Recent Publications

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Unsupervised detection of contextualized embedding bias with application to ideology

Unsupervised detection of contextualized embedding bias with application to ideology

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Two contrasting data annotation paradigms for subjective NLP tasks

Two contrasting data annotation paradigms for subjective NLP tasks

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

View all

Research Interests

Machine learning has made remarkable process in processing and generating human language.

Janet's research uses machine learning methods to understand language systems both in individuals and in communities. She is especially interested in how systems of word formation can be learned from statistical properties of the input, deployed to analyse novel words, and adapted to different contexts. She has worked on many diverse languages, including English, Arabic, Hindi, Turkish, Tagalog, and Zulu. Right now, Janet's main focus is on English, Finnish, German, and French.

People continue to be much better than computers at learning language and using it in novel ways. Janet is particularly interested in the research threads in natural language processing that aim to learn from human performance and build computer systems that incorporate key characteristics of human cognition:

  • People do not use training data that was hand-labelled by adult experts. They can use incomplete or indirect information about the structures and meaning of words and sentences. Human learning is only semisupervised.
  • People can form powerful generalisations from much less data than computers now require. For low-resource scenarios, like building systems for minority languages, engineers need the same ability.
  • People adapt their language processing depending on the social situation and the topic of discussion. Contextualized methods aim to do the same.

In addition to working on computer modelling of language, Janet also undertakes experiments on real and artificial languages. These resemble computer games and are hosted on-line to obtain data from large numbers of people. Using the results, we can figure out the assumptions and biases that people bring to language learning.

Current Projects

The Wordovators project, in collaboration with the University of Canterbury, uses one and twoperson computer games to investigate the interaction of social and cognitive factors in language learning.

Most Recent Publications

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Unsupervised detection of contextualized embedding bias with application to ideology

Unsupervised detection of contextualized embedding bias with application to ideology

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Two contrasting data annotation paradigms for subjective NLP tasks

Two contrasting data annotation paradigms for subjective NLP tasks

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

View all

DPhil Opportunities

I am seeking DPhil students in natural language processing, and dynamics of communication.

Most Recent Publications

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models

Unsupervised detection of contextualized embedding bias with application to ideology

Unsupervised detection of contextualized embedding bias with application to ideology

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Modeling ideological salience and framing in polarized online groups with graph neural networks and structured sparsity

Two contrasting data annotation paradigms for subjective NLP tasks

Two contrasting data annotation paradigms for subjective NLP tasks

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

Forecasting COVID-19 caseloads using unsupervised embedding clusters of social media posts

View all