Reinforcement Learning & Robotics

Teaching a Humanoid Robot to Play Table Tennis

Robomotion trains a Unitree G1 humanoid to mimic table tennis motion-capture trajectories using GPU-accelerated reinforcement learning — then exports the policy to run on real hardware.

See It in Action

Three runs of the G1 humanoid tracking reference trajectories in simulation, trained entirely on GPU.

Unitree G1 humanoid robot in simulation

Motion Imitation from Capture Data

The system starts with raw motion-capture recordings of human table tennis strokes. A quality filter removes noisy or physically implausible clips before they ever touch training, keeping only smooth, in-limit trajectories.

The robot then learns to reproduce these motions in a physics simulator, guided by a reward signal that penalizes deviation from the reference pose at every timestep.

  • 1 Filter Score mocap clips on jerk, joint limits, and root height — discard outliers
  • 2 Simulate Run thousands of parallel physics environments on GPU with MuJoCo MJX + JAX
  • 3 Train Optimize a neural-network policy with PPO to minimize tracking error
  • 4 Export Trace the learned policy into a portable ONNX file for hardware deployment
Multiple angles of the G1 robot performing table tennis motions

Generalization Across Environments

Training with domain randomization exposes the robot to a range of physics conditions — varied joint friction, link masses, center-of-mass offsets, and motor armature — so the policy generalizes beyond the exact simulator it was trained in.

The result is a controller that handles the inherent gap between simulation and the real world without hand-tuned adaptation.

Read the code →

Technical Features

A full end-to-end stack from raw motion data to a deployable robot policy.

GPU-Parallelized Simulation

MuJoCo MJX runs thousands of environments simultaneously on GPU. JAX's vmap vectorizes the physics step across the entire batch with no Python loops.

🧠

Motion Imitation via PPO

Proximal Policy Optimization trains a neural network to track reference joint trajectories frame-by-frame, learning coordinated whole-body control from mocap data.

🎲

Domain Randomization

Physics parameters — link masses, joint friction, center-of-mass offsets, motor armature — are randomized each episode to bridge the sim-to-real gap.

🔍

Trajectory Quality Filtering

A pre-training scorer evaluates every mocap clip on smoothness (jerk), joint limit compliance, and root height plausibility. Low-quality clips are excluded before training begins.

📦

ONNX Policy Export

The trained JAX policy is traced and converted to a portable .onnx file. Inference runs via ONNX Runtime — no JAX or GPU required on the target device.

🛑

Early Stopping

A configurable monitor tracks reward improvement and halts training when progress stalls — saving GPU hours on runs that have already converged or diverged.