Skip to main content

Oxford researchers awarded funding to investigate hidden attacks on AI systems

The grant will support research into vulnerabilities in vision-language AI agents and how malicious instructions can spread across systems

News pic of Adel Bibi, Phil Torr - funding to investigate hidden attacks on AI systems

Researchers in the department have received a grant from Coefficient Giving for a project exploring security risks in next-generation AI systems that interact with digital environments through vision-language models (VLMs).

The project, Improving VLM Attack Transferability, is led by Dr Adel Bibi and Prof Phil Torr, with former postdoctoral researcher Dr Alasdair Paren also involved in the work. The award will support research until June 2027.

Dr Paren said: ‘‘For AI agents to be trusted with high-value tasks, they must be robust against attack. Significant effort is currently being directed toward defending against prompt injection, but adversarial image attacks on multimodal systems present a far harder detection and mitigation challenge. Fortunately, poor transferability renders these attacks largely infeasible against closed frontier systems for now. We aim to determine whether this is a limitation of current attack techniques or something more fundamental.’’

Many modern AI agents rely on VLMs to interpret screenshots and visual information before deciding how to act. The research will investigate whether apparently harmless public images such as those found on websites, advertisements, or social media posts could contain hidden malicious instructions capable of manipulating AI systems into unsafe behaviour.

The team will study how these hidden attacks may transfer across different AI agents and multimodal systems, helping researchers better understand vulnerabilities in autonomous AI technologies.

A significant portion of the funding will support access to frontier AI models through API credits, enabling large-scale evaluation and benchmarking of state-of-the-art systems and new ways to break into them.

Prof Phil Torr said: “Imagine a photo on a website that looks completely ordinary to you, but contains a hidden message that hijacks an AI. As AI assistants start browsing the web and using computers on our behalf, that is a real risk. This award lets us find these weaknesses and understand how they spread, so they can be fixed before bad actors take advantage of them.”

The researchers hope the work will contribute to the development of safer and more robust AI systems, including improved safeguards and evaluation methods for AI agents operating in real-world environments.

Dr Adel Bibi said: “We are also exploring other forms of hidden AI injections beyond images, including audio attacks. Imagine an adversary nearby playing a sound you cannot hear that silently triggers an AI assistant on your phone to send messages, or manipulates a voice-controlled home system into unlocking a front door. As AI agents become more integrated into daily life, understanding these risks is essential to building systems we can trust.”