Skip to main content

New Study Explains Why Neural Networks Prefer “Flatter” Solutions

A new research paper published in the journal Neural Networks sheds light on a long-standing mystery in artificial intelligence: why deep learning systems trained with gradient descent often settle on stable, high-performing solutions.

Graph showing how training instabilities favor flatter solutions in gradient descent

The paper adds to ongoing work in the field aimed at demystifying deep learning, suggesting that what once appeared to be unstable behaviour is a key ingredient in AI’s effectiveness.

Neural networks, core tools in modern AI for tasks from image recognition to language processing, are known for their ability to learn complex patterns, but their training dynamics remain poorly understood. This research contributes to a growing effort to uncover the mathematical principles behind their success. The study investigates training instabilities, that occur as neural networks learn and finds that these instabilities play a constructive role. Rather than being problematic, they guide models toward “flatter” regions of the reward landscape, which are associated with better performance on new data.

“What’s exciting here is that it turns an abstract observation about the orientation of dominant curvature into a concrete mechanism that helps explain the strong generalisation performance of neural networks.”

Lead author, Dr. Lawrence Wang

 

Central to the paper is a newly identified mechanism, Rotational Polarity of Eigenvectors. This concept describes how the dominant directions of curvature in the reward space rotate during training. During training instabilities, this rotational mechanism gives rise to a coupled dynamical system that captures the intricate dynamics of learning and helps explain how gradient descent navigates the complex, very high-dimensional reward landscapes of modern deep learning models. This connects also account for the strong generalisation performance observed in modern deep neural networks despite their vast numbers of parameters.

The findings could have practical implications for improving training stability and designing more efficient algorithms. By better understanding how instabilities shape learning, researchers may be able to build models that are both more reliable and easier to train.

“By better understanding how large AI models learn, we pave the way to models that are more reliable, easier to train, less power-consuming and safer.”

Co-author, Professor Stephen Roberts

 

Lawrence Wang, Stephen J. Roberts (2026). “Training instabilities favor flatter solutions in gradient descent”. Neural Networks, Volume 201, 108874, ISSN 0893-6080. https://doi.org/10.1016/j.neunet.2026.108874 https://www.sciencedirect.com/science/article/pii/S0893608026003357