University of California, Berkeley
We present a sim-to-real learning-based approach for real-world humanoid locomotion. Our controller is a causal Transformer trained by autoregressive prediction of future actions from the history of observations and actions. We hypothesize that the observation-action history contains useful information about the world that a powerful Transformer model can use to adapt its behavior in-context, without updating its weights. We do not use state estimation, dynamics models, trajectory optimization, reference trajectories, or pre-computed gait libraries. Our controller is trained with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed to the real world zero-shot. We evaluate our approach in high-fidelity simulation and successfully deploy it to the real robot as well. To the best of our knowledge, this is the first demonstration of a fully learning-based method for real-world full-sized humanoid locomotion.
We explore a learning-based approach for humanoid locomotion. We present a Transformer-based controller that predicts future actions autoregressively from the history of past observations and actions. We hypothesize that the history of observations and actions implicitly encodes the information about the world that a powerful Transformer model can use to adapt its behavior dynamically at test time. For example, the model can use the history of desired vs actual states to figure out how to adjust its actions to better achieve future states. This can be seen as a form of in-context learning (changing model behavior without updating the model parameters).
Our model is trained with large-scale model-free reinforcement learning (RL) on an ensemble of randomized environments in simulation. We leverage fast GPU simulation powered by IsaacGym and parallelize training across multiple GPUs and thousands of environments. Thanks to this, we are able to collect a large number of samples for training (order of 10 Billion in about a day).
We evaluate our learnt policies on a real humanoid robot. We find that our policies trained entirely in simulation are able to transfer to the real world zero-shot.
A humanoid robot should be able to walk over different terrains. To assess the capabilities of our transformer-based controller in this regard, we conducted a series of experiments on different terrains in the laboratory. In each experiment, we command the robot to walk forward and vary the terrain type. We first consider flat terrains with variation in friction (slippery plastic floor and carpet) and roughness (wrapping bags and cables spread over the floor). We find that our controller is able to handle these types of terrains, with the slippery plastic floor being the most challenging.
Next, we consider walking over slopes and small steps. We note that our robot was not trained on steps in simulation. We observe that our controller initially makes a mistake when ascending the step, but quickly corrects and raises the leg higher and faster on the second attempt.
We also evaluate the walking behavior when carrying loads of varying mass and shape. Notice that our walking behaviors exhibit an emergent arm swing for balancing. Placing loads on an arm interferes with this and requies the robot to adapt accordingly.
Finally, we test the robustness of policies to sudden external forces. We push the robot with a wooden stick or throw light cardboard boxes at it.
While our policies show promising singal in the real world, they are certainly not without limitations. For example, we perform an experiment to test the limits of the robustness of our controller. We tie a cable to the back of the robot, command the robot to walk forward, and pull the robot backward. We see that the robot is fairly robust but falls eventually when pulled very hard.
@article{HumanoidTransformer2023,
title={Learning Humanoid Locomotion with Transformers},
author={Ilija Radosavovic and Tete Xiao and Bike Zhang and Trevor Darrell and Jitendra Malik and Koushil Sreenath},
year={2023},
journal={arXiv:2303.03381}
}