Humanoid Robot Learns to Play Tennis Using Imperfect Human Motion Data

Teaching a humanoid robot to swing a tennis racket is harder than it sounds. The sport demands split-second footwork, precise wrist angles, and the ability to read a ball moving at speed — all while balancing on two legs. A new system from researchers affiliated with Chinese AI robotics company Galbot tackles this challenge in an unusually practical way: instead of requiring pristine motion capture from professional players, it learns from the messy, fragmentary data that amateurs generate.

The Unitree G1 executing forehand strokes, backhand strokes, and lateral footwork learned entirely from amateur motion capture data via the LATENT system.

What Is LATENT?

LATENT stands for Learns Athletic humanoid TEnnis skills from imperfect human motioN daTa. The system was developed by researchers working with Galbot and deployed on the Unitree G1 — a roughly human-sized bipedal robot with articulated arms. The core insight behind LATENT is that high-quality, complete motion capture datasets are expensive and hard to collect for dynamic sports skills. Amateur players, by contrast, produce abundant data, even if it is noisy, fragmentary, and inconsistently executed. LATENT is designed to extract useful motion primitives from exactly this kind of imperfect input.

Five Hours of Amateur Footage, Four Skills Learned

The researchers collected approximately five hours of primitive tennis skill fragments from amateur players using a motion capture suit. Rather than full match sequences, these were short clips of individual strokes and movement patterns — forehand strokes, backhand strokes, lateral shuffles, and crossover steps. Because amateur motion capture is riddled with noise, missing frames, and inconsistent technique, directly replaying the data on a robot would produce jerky, unstable motion. Instead, LATENT uses the captured fragments as loose references and applies reinforcement learning to refine and physically ground each skill through extensive simulation. The result is a set of four distinct athletic capabilities that transfer to the real G1 hardware.

Reinforcement Learning in the Loop

The training pipeline leans heavily on simulation. The team ran results averaged over 10,000 trials, using the simulated environment to reward physically plausible, stable executions of each skill while penalizing falls and racket mis-hits. This sim-to-real transfer approach is now standard in humanoid locomotion research, but applying it to a racket sport adds significant complexity: the robot must not only stay upright but coordinate arm swing timing, grip orientation, and body rotation into a coherent stroke. LATENT's reinforcement learning framework handles all of this jointly, outperforming previous methods on every evaluated skill in the team's benchmarks.

"By leveraging only fragmentary and imperfect motion capture data from amateur players, LATENT successfully enables the humanoid robot to perform tennis skills including forehand stroke, backhand stroke, lateral shuffle, and crossover step."
— LATENT paper authors, arXiv preprint 2603.12686

Why This Matters Beyond Tennis

The implications of LATENT reach well beyond the tennis court. Most humanoid robot tasks that involve tools — whether a racket, a wrench, or a surgical instrument — require the robot to learn from human demonstrations. If that learning pipeline requires professional-grade, perfectly captured data, it becomes a bottleneck that slows deployment. A system that can extract reliable skills from imperfect, low-cost motion capture data dramatically lowers the bar for teaching new behaviors. Galbot and collaborators are positioning LATENT as a general framework for athletic skill transfer, with tennis serving as a demanding but tractable proof of concept. The paper has been published as a preprint on arXiv and is expected to draw significant attention from the humanoid locomotion community.

🔑 Key Takeaways

LATENT (Learns Athletic humanoid TEnnis skills from imperfect human motioN daTa) is a new system from Galbot-affiliated researchers deployed on the Unitree G1.
Just 5 hours of amateur motion capture data was enough to train four distinct skills: forehand stroke, backhand stroke, lateral shuffle, and crossover step.
Reinforcement learning over 10,000+ simulated trials refines noisy human reference data into stable, transferable robot motion — outperforming prior methods.
The framework targets a real bottleneck: reducing the data quality requirements for teaching humanoids new physical skills.

📰 Source: TechXplore | 📄 Preprint: arXiv 2603.12686