Skip to main content
 Fazl Barez

Fazl Barez BA PhD MSc

Dr

Senior Researcher

Biography

Fazl Barez is a Senior Researcher at the University of Oxford, where he works on AI safety, interpretability, and technical AI governance. His research focuses on understanding the internal mechanisms of neural networks and developing methods to make advanced AI systems more transparent, reliable, and controllable.

His work examines how to identify the causes of harmful or unexpected model behaviour, verify whether a model’s reasoning faithfully reflects its underlying computations, and translate interpretability research into actionable oversight and governance mechanisms. He also studies the broader societal impacts of advanced AI systems, including questions of human agency, institutional control, and accountability. 

At Oxford, he teaches the AI Safety and Alignment course and supervises students working across AI safety and machine learning systems. He collaborates with leading industry research labs and is affiliated with the Cambridge's Centre for the Study of Existential Risk, ELLIS, and other international research centres.

Research Interests

Technical AI safety and alignment

Mechanistic interpretability and neural network analysis

Internal representations and reasoning in large language models

Detection and mitigation of harmful or deceptive model behaviour

Verification and evaluation of model reasoning processes

Robust removal of dangerous capabilities from AI systems

Automated interpretability and scalable oversight

Technical AI governance and AI auditing methods

AI evaluation, assurance, and accountability frameworks

Human agency and societal impacts of advanced AI systems

Reliability and controllability of foundation models I

ntersections between neuroscience, cognition, and machine learning

Current Research Projects

Mechanistic Interpretability for Large Language Models — Developing methods to identify and analyse the internal computational mechanisms underlying model behaviour and reasoning.

Scalable Oversight and Automated Verification — Building systems that can automatically trace, verify, and evaluate model reasoning processes at scale.

Capability Removal and Safety Interventions — Investigating methods for reliably removing dangerous or undesirable capabilities from neural networks without degrading general performance.

Technical Governance for Advanced AI Systems — Translating interpretability and safety research into practical frameworks for auditing, verification, and regulatory oversight.

Deception, Misalignment, and Hidden Objectives in AI Systems — Studying how deceptive or strategically misaligned behaviour emerges in advanced models and how it can be detected.

Human Agency and Societal Impacts of AI — Examining how increasingly capable AI systems affect human decision-making, institutional power, and societal autonomy.

Neuroscience-Inspired Approaches to Understanding Intelligence — Exploring connections between biological cognition and machine learning to better understand the foundations of intelligence.

Related Academics

Collaborators across the University of Oxford, the Centre for the Study of Existential Risk at the University of Cambridge, the School of Informatics, University of Edinburgh, NTU Digital Trust Centre, ELLIS, and leading industry AI research labs and Universities.