Learning Humanoid Locomotion
with Transformers

Authors contributed equally and listed in alphabetical order

University of California, Berkeley

Paper Video

We present a sim-to-real learning-based approach for real-world humanoid locomotion. Our controller is a causal Transformer trained by autoregressive prediction of future actions from the history of observations and actions. We hypothesize that the observation-action history contains useful information about the world that a powerful Transformer model can use to adapt its behavior in-context, without updating its weights. We do not use state estimation, dynamics models, trajectory optimization, reference trajectories, or pre-computed gait libraries. Our controller is trained with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed to the real world zero-shot. We evaluate our approach in high-fidelity simulation and successfully deploy it to the real robot as well. To the best of our knowledge, this is the first demonstration of a fully learning-based method for real-world full-sized humanoid locomotion.

Transformers for Humanoid Locomotion

Our controller is a causal Transformer trained by autoregressive prediction of future actions from the observation-action history.

We explore a learning-based approach for humanoid locomotion. We present a Transformer-based controller that predicts future actions autoregressively from the history of past observations and actions. We hypothesize that the history of observations and actions implicitly encodes the information about the world that a powerful Transformer model can use to adapt its behavior dynamically at test time. For example, the model can use the history of desired vs actual states to figure out how to adjust its actions to better achieve future states. This can be seen as a form of in-context learning (changing model behavior without updating the model parameters).

Massively Parallel Training in Simulation

Our model is trained with large-scale model-free reinforcement learning (RL) on an ensemble of randomized environments in simulation. We leverage fast GPU simulation powered by IsaacGym and parallelize training across multiple GPUs and thousands of environments. Thanks to this, we are able to collect a large number of samples for training (order of 10 Billion in about a day).

We train our policies on various terrain types, including planes, rough planes, and smooth slopes. Our robots execute a variety of randomly sampled walking commands such as walking forward, sideward, turning, or a combination thereof.

Zero-Shot Transfer to the the Real World

We evaluate our learnt policies on a real humanoid robot. We find that our policies trained entirely in simulation are able to transfer to the real world zero-shot.

Walking forward and backward

Walks on Different Surfaces

A humanoid robot should be able to walk over different terrains. To assess the capabilities of our transformer-based controller in this regard, we conducted a series of experiments on different terrains in the laboratory. In each experiment, we command the robot to walk forward and vary the terrain type. We first consider flat terrains with variation in friction (slippery plastic floor and carpet) and roughness (wrapping bags and cables spread over the floor). We find that our controller is able to handle these types of terrains, with the slippery plastic floor being the most challenging.

Walking over flat terrains with different friction (top) and roughness (bottom)

Walks over Uneven Terrains

Next, we consider walking over slopes and small steps. We note that our robot was not trained on steps in simulation. We observe that our controller initially makes a mistake when ascending the step, but quickly corrects and raises the leg higher and faster on the second attempt.

Walking over slopes (left) and steps (right)

Carries Payloads

We also evaluate the walking behavior when carrying loads of varying mass and shape. Notice that our walking behaviors exhibit an emergent arm swing for balancing. Placing loads on an arm interferes with this and requies the robot to adapt accordingly.

Carrying loads of varying mass and shape

Is Robust to Disturbances

Finally, we test the robustness of policies to sudden external forces. We push the robot with a wooden stick or throw light cardboard boxes at it.

Applying suddent external forces

Failure Cases

While our policies show promising singal in the real world, they are certainly not without limitations. For example, we perform an experiment to test the limits of the robustness of our controller. We tie a cable to the back of the robot, command the robot to walk forward, and pull the robot backward. We see that the robot is fairly robust but falls eventually when pulled very hard.

Pulling a robot with a rope until it falls

BibTeX

@article{HumanoidTransformer2023,
  title={Learning Humanoid Locomotion with Transformers},
  author={Ilija Radosavovic and Tete Xiao and Bike Zhang and Trevor Darrell and Jitendra Malik and Koushil Sreenath},
  year={2023},
  journal={arXiv:2303.03381}
}