OxNLP talk abstracts
1 December 2023
PhD Student, ETH Zurich
Title: Understanding Language Models with Formal Language Theory: Recurrent Neural Language Models as Recognizers of (Probabilistic) Formal Languages
Language models (LMs) are currently at the forefront of NLP research due to their remarkable versatility across diverse tasks. Technologists have begun to speculate about such capabilities; among these speculations are claims that large LMs are general-purpose reasoners or even that they could be a basis for general AI. However, a large gap exists between their observed capabilities and our theoretical understanding of those. Luckily, in the context of computer science, the notion of reasoning is concretely defined—it refers to the ability to execute algorithms. Hence, if we study LMs in terms of well-understood formalisms characterizing the complexity of algorithms that can be solved, we can more precisely describe LMs’ abilities and limitations.
With this motivation in mind, a large field of work has investigated the representational capacity of recurrent neural network (RNN) LMs in terms of their capacity to recognize formal languages—a well-established notion of the algorithmic capability of a formal model. In this talk, I will outline some classical results describing the representational capacity of RNNs. I will then connect this to the concrete task of language modeling, since LMs do not only describe unweighted formal languages. Rather, they define probability distributions over strings. We, therefore, pay special attention to the recognition of weighted formal languages and discuss what classes of probability distributions RNN LMs can represent. I will describe how simple RNN LMs with the Heaviside activation are equivalent to deterministic probabilistic finite-state automata, while ReLU-activated simple RNN LMs can model non-deterministic probabilistic finite-state automata and can, given unbounded computation time, even express any computable probability distribution.
24 November 2023
Title: Whose LLM? Representation, bias, and applications to misinformation
Large-language models (LLMs) and generative AI could revolutionize computational social science, but their use also raises fundamental questions of representation and bias. Downstream users of LLMs need stronger evaluations in order to understand the languages, domains, and tasks within an LLM’s training and those which fall outside. This talk will present an overview of multiple studies. The first applies LLMs to identify misinformation narratives. The second demonstrates how current approaches to Reinforcement Learning from Human Feedback (RLHF) fail to address harmful, stereotypical outputs in non-Western contexts. Finally, the talk will present an in-progress experiment that aims to better understand what different people want from LLMs and how they perceive generative AI output.
17 November 2023
Title: Guidance — constrained sampling with formal grammars for controlling language models
Getting Language Models (LMs) to behave exactly the way we want them to can be challenging. Guidance is a popular open source library (14k+ stars) that combines the best of natural language prompting and code to help users structure prompts and express constraints. Guidance interfaces with many popular LM providers — both local, like HuggingFace and llama.cpp, and remote, like OpenAI — and provides a rich developer experience for constraining the output of LMs.
This talk will be a sneak preview of a major update to the guidance library. We’ll go over the basics of the guidance language, and — on the research side — discuss how guidance efficiently translates user specified constraints into formal grammars, which then make low level alterations to an LM sampling process. We’ll also discuss token healing — how subtle but important issues arise at prompt boundaries when translating from text to token space, and how guidance automatically heals these issues for users. We’ll end with a forward looking discussion on the future of constrained sampling.
10 November 2023
DPhil student, Oxford e-Research Centre, University of Oxford
Title: Sparks of Artificial General Intelligence: Early experiments with GPT-4
Abstract: While GPT-4 has awed both academia and the world beyond with its seemingly human-like capabilities, this report by Microsoft makes early steps towards trying to grasp its true capabilities. The operative word here is 'sparks'. While GPT-4 may seem intelligent, its intelligence only appears in fleeting sparks. Existing benchmarks are not designed to measure creativity and there is always a chance that GPT-4 had ingested existing benchmarks to become good at them. The authors try to qualitatively judge its power with semi-systematic qualitative judgements, while asking in the abstract if we need a better definition of 'general intelligence' anyway. Since many in the reading group are interested in benchmarking the power of GPT-4, this report promises to provide the different aspects of 'intelligence' we should be testing when designing benchmarks for the different new-age large-language models.
3 November 2023
Senior Lecturer of Machine Learning, Imperial College, London
Title: Interpretable architectures and guided attention for neural language models
Neural models of natural language have achieved remarkable results, but their interpretability remains an open issue. While they are able to output accurate predictions, it is unclear how they reached their decision and whether it was for the right reasons.
In this work, we investigate neural architectures for representing language that are inherently interpretable - they are able to point to relevant evidence in the input by themselves. This is achieved by careful use of attention, having the model dynamically make decisions about which input areas are important to consider. Furthermore, we can directly supervise this attention, teaching the model to make decisions based on the same evidence as humans.
27 October 2023
DPhil student, Autonomous Intelligent Machines and Systems CDT, University of Oxford
Title: When do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Context-based fine-tuning methods like prompting, in-context learning, soft
prompting (prompt tuning) and prefix-tuning have gained popularity as they of-
ten match the performance of full fine-tuning with a fraction of the parameters.
Despite their empirical successes, there is little theoretical understanding of how
these techniques influence the internal computation of the model and their expres-
We show that despite the continuous embedding space being
much more expressive than the discrete token space, soft-prompting and prefix-
tuning are strictly less expressive than full fine-tuning. Concretely, context-based
fine-tuning cannot change the relative attention pattern over the content and can
only bias the outputs of an attention layer in a fixed direction. While this means that fine-tuning techniques such as prompting, in-context learning, soft prompting and prefix-tuning can successfully elicit or combine skills already present in the pretrained model, they cannot learn tasks requiring new attention patterns.
20 October 2023
Professor of Language Modelling, Deptartment of Engineering Science, University Oxford
Title: Mismatches between human language processing and NLP
NLP algorithms have achieved high performance on many tasks. However, this is often at the expense of exorbitant training. Even with such training, they fall short on certain tasks that humans perform easily and reliably. This talk will give an overview of some well-established core properties of human language processing. It will summarize the extent to which SOTA transformer models and generative models incorporate, or fail to incorporate, these properties.
16 June 2023
Jingwei Ni. from ETH & UZH
Title: When Does Aggregating Multiple Skills with Multi-Task Learning Work? A Case Study in Financial NLP
Abstract: Multi-task learning (MTL) aims at achieving a better model by leveraging data and knowledge from multiple tasks. However, MTL does not always work – sometimes negative transfer occurs between tasks, especially when aggregating loosely related skills, leaving it an open question when MTL works. Previous studies show that MTL performance can be improved by algorithmic tricks. However, what tasks and skills should be included is less well explored. In this work, we conduct a case study in Financial NLP where multiple datasets exist for skills relevant to the domain, such as numeric reasoning and sentiment analysis. Due to the task difficulty and data scarcity in the Financial NLP domain, we explore when aggregating such diverse skills from multiple datasets with MTL can work. Our findings suggest that the key to MTL success lies in skill diversity, relatedness between tasks, and choice of aggregation size and shared capacity. Specifically, MTL works well when tasks are diverse but related, and when the size of the task aggregation and the shared capacity of the model are balanced to avoid overwhelming certain tasks.
26 May 2023
Bhagat Singh Rekhi Chair Professor of Computer Science and Engineering at IIT Bombay
Title: Natural Language Processing and Mental Health
Abstract: As per WHO, the number of mental health patients all over the world is about 1000 million, and about 14% of deaths in the world are due to mental disorders. The ratio of doctors to patients in the case of mental health support is about 1:10000. This situation underlines the need for automation in mental health monitoring. In this talk, we present our ongoing work on using natural language processing (NLP) and machine learning (ML) for detecting mental conditions and also generate positive and reassuring statements that can pull a person from the brink of taking extreme steps. The data sets, classification, and text generation framework will be described pointing to rich possibilities of future work.
Bio: Prof Pushpak Bhattacharyya is Bhagat Singh Rekhi Chair Professor of Computer Science and Engineering at IIT Bombay. He has done extensive research in Natural Language Processing and Machine Learning. Some of his noteworthy contributions are Indian Language NLP like IndoWordnet, Cognitive NLP, Low Resource MT, and Knowledge Graph-Deep Learning Synergy in Information Extraction and Question Answering. He served as President of the ACL (Association of Computational Linguistics) in 2016.
19 May 2023
Associate Professor of Computer Science
University of California, Santa Barbara
Title: On bias, trustworthiness, and safety of language models
Bio: William Wang is the Co-Director of UC Santa Barbara's Natural Language Processing group and Center for Responsible Machine Learning. He is the Duncan and Suzanne Mellichamp Chair in Artificial Intelligence and Designs, and an Associate Professor in the Department of Computer Science at the University of California, Santa Barbara. He has published more than 100 papers at leading NLP/AI/ML conferences and journals, and received best paper awards (or nominations) at ASRU 2013, CIKM 2013, EMNLP 2015, and CVPR 2019, a DARPA Young Faculty Award (Class of 2018), an IEEE AI's 10 to Watch Award (Class of 2020), and many more. He frequently serves as an Area Chair or Senior Area Chair for NAACL, ACL, EMNLP, and AAAI. He is an elected member of IEEE Speech and Language Processing Technical Committee (2021-2023) and a member of ACM Future of Computing Academy.
12 May 2023
Lecturer in Natural Language Processing
University of Edinburgh
Title: Modular Deep Learning
Abstract: "Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modular deep learning has emerged as a promising solution to these challenges. In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated.
These properties enable positive transfer and systematic generalisation by separating computation from routing and updating modules locally. In this talk, I will introduce a general framework for modular neural architectures, providing a unified view over several threads of research that evolved independently in the scientific literature. In addition, I will provide concrete examples of their applications: 1) cross-lingual transfer by recombining task-specific and language-specific task sub-networks; 2) knowledge-grounded text generation by Fisher-weighted addition of modules promoting positive behaviours (e.g. abstractiveness) or negation of modules promoting negative behaviours (e.g. hallucinations); 3) generalisation to new NLP and RL tasks by jointly learning to route information to a subset of modules and to specialise them towards specific skills (common sub-problems reoccurring across different tasks).