Robomotion trains a Unitree G1 humanoid to mimic table tennis motion-capture trajectories using GPU-accelerated reinforcement learning — then exports the policy to run on real hardware.
Trained Policy
Three runs of the G1 humanoid tracking reference trajectories in simulation, trained entirely on GPU.
How It Works
The system starts with raw motion-capture recordings of human table tennis strokes. A quality filter removes noisy or physically implausible clips before they ever touch training, keeping only smooth, in-limit trajectories.
The robot then learns to reproduce these motions in a physics simulator, guided by a reward signal that penalizes deviation from the reference pose at every timestep.
Robustness
Training with domain randomization exposes the robot to a range of physics conditions — varied joint friction, link masses, center-of-mass offsets, and motor armature — so the policy generalizes beyond the exact simulator it was trained in.
The result is a controller that handles the inherent gap between simulation and the real world without hand-tuned adaptation.
Read the code →Under the Hood
A full end-to-end stack from raw motion data to a deployable robot policy.
MuJoCo MJX runs thousands of environments simultaneously on GPU. JAX's vmap vectorizes the physics step across the entire batch with no Python loops.
Proximal Policy Optimization trains a neural network to track reference joint trajectories frame-by-frame, learning coordinated whole-body control from mocap data.
Physics parameters — link masses, joint friction, center-of-mass offsets, motor armature — are randomized each episode to bridge the sim-to-real gap.
A pre-training scorer evaluates every mocap clip on smoothness (jerk), joint limit compliance, and root height plausibility. Low-quality clips are excluded before training begins.
The trained JAX policy is traced and converted to a portable .onnx file. Inference runs via ONNX Runtime — no JAX or GPU required on the target device.
A configurable monitor tracks reward improvement and halts training when progress stalls — saving GPU hours on runs that have already converged or diverged.