Skip to main content

Environment-sensing legged robot navigates by learning from its mistakes


A team of researchers at the Oxford Robotics Institute are working on the control of legged robots. They are developing solutions for these legged robots to be able to perceive their environment and make intelligent decisions to move from one point to the other.

Lead author of a recent paper on this work and DPhil student Siddhant (Sid) Gangapurwala explains, “For example, it is extremely natural for us as humans to be able to move in a location with uneven terrain and significant obstacles. However, for a robot in the same environment, it needs algorithms that can help sense the environment from sensors and then use this information to decide where to step, and how to step without falling over. This has been a difficult problem to solve. “

“In case of traditional algorithms, much of the planning behaviour is hand-tuned. This implies that the robot cannot always fully exploit its mechanical capabilities.”

The ANYmal B robot

To combat this, the researchers used an AI-based approach, called Reinforcement Learning (RL), to get the robot to plan where to place its feet while walking on uneven ground. The advantage of using RL is that the robot learns from its mistakes and gets better by continuously learning, failing, and learning.

As Sid adds, “We train a neural network using Reinforcement Learning (RL) to plan feet placements over rough terrain such that the robot is able to follow the velocity commands generated by the user, using a joypad – similar to controlling a remote-control toy car, or by a high-level control algorithm allowing it to function autonomously even in complex environments such as nuclear power plants or oil rigs.”

“To the best of our knowledge, we are the first to demonstrate the use of RL for terrain-aware legged locomotion on a real robot.”

Learning to walk over obstacles from trial and error can involve a lot of falling over, “In order to avoid permanent damage to the robot, we use a physics simulator which enables us to simulate the behaviour of the robot on a computer platform. This simulator also runs a lot faster than real-time. The training is therefore performed in a simulator and then the obtained behaviour is transferred to the real robot, referred to as sim-to-real transfer.”

A video summary of the research, and the robot in action.

But the robot cannot rely on exclusively learning from the simulator, Sid tells us “We cannot perfectly model the real-world physics in a simulator. Doing so is computationally expensive. Moreover, certain features of the robot, such as the actuation dynamics and sensor noise cannot be accurately modelled. This makes sim-to-real transfer difficult. To address this, we use techniques such as dynamics randomization where we change the physical properties of the robot in the simulator during training such that the learnt behaviour is robust to small changes in the dynamic properties of the robot allowing for transfer from a simulator to the real world.”

However, a robot doesn’t just move its feet, the motion of the entire robot body needs to be taken into consideration. Mathieu Geisert, co-author, adds “We use a technique called optimal control (OC) to generate whole-body motions so as to track these footstep plans. This generation of motion plans is done online (during locomotion) and is used to generate low-level commands for the actuators present on the robot enabling it to move its legs and perform locomotion.”

To be able to apply the algorithm to the legged robot, it needs certain components to see its environment, and measure its location and how fast its base and joints are moving. For perception, the team uses depth cameras and LIDARs (Light Detection and Ranging), and for tracking the velocity, position and acceleration of the robot they use motor encoders and Inertial Measurement Units (IMUs). 

LIDARs emit a pulsed laser and measures how long it takes for the laser to be reflected back by the environment, which allows the distance between the robot and an object or obstacle to be calculated. “These sensors scan and generate a map of the environment they are deployed in. We then use such a map of the terrain-elevation to obtain footstep plans using the Neural Network policy trained using RL.”

The team have also included other features, “We have included a recovery control policy trained using RL to stabilize the robot in cases of external perturbations. We can also transfer the RLOC (Reinforcement Learning Optimal Control) framework to another robot of similar shape but different dynamic properties (example mass and actuation torque limits).”

There are many situations in which this research can be applied. Dr. Ioannis Havoutis, Principal Investigator, Oxford Robotics Institute, tells us “This framework is especially useful in environments that are either difficult to access; such as off-shore energy platforms, or potentially dangerous for humans to operate in; such as in Nuclear Plant decommissioning tasks. Furthermore, autonomous systems deployed for such tasks enable continuous, repeatable and precise inspection without breaks. Additionally, software and hardware modules can be easily developed and integrated with the autonomous system to allow use for specialized tasks, such as detecting radiation in nuclear waste storage facilities or in monitoring renewable energy facilities for creating and updating digital twins and better planning maintenance tasks for improving efficiency. ”

Sid concludes, “As part of future work, we aim to integrate an autonomous planning strategy that can make long-horizon plans to get the robot to navigate from one point to the other without human intervention. This will also allow the robot to replan its trajectory in case it comes across obstacles it thinks cannot be traversed due to the physical limitations of the robot further adding to the autonomy of the system.”

Could South African mine wastes provide feasible storage for CO2?