Physical AIBy nxted Research Team· Published 29 May 2026· Updated 30 May 2026· 2 min read

Why egocentric data is the bottleneck for humanoid robots

The internet is not a substitute for first-person human demonstration. Here's the maths.

The bottleneck

Every humanoid robot today is gated on one thing: the supply of first-person human demonstration data.

The internet is not a substitute. YouTube footage is third-person. Wikipedia is text. Reddit is opinions. Robot training requires how a skilled human moves through a task - joint angles, gaze, contact forces - recorded from the perspective the robot will occupy.

The maths

NVIDIA's recent Embodied AI study found a 54% improvement in manipulation accuracy moving from web-scraped to egocentric data. Apple Vision Pro researchers needed 829 hours of shoelace-tying footage to teach a robot a single dexterous task.

If you extrapolate that ratio to the 5,000+ tasks a general-purpose humanoid needs, you arrive at 1.5 million+ hours of egocentric video. The supply does not exist.

Why India

India has one of the world's largest skilled workforces, broad English fluency in the working population, and a materially lower cost base than the US. It is one of the few places where this scale of capture is economically rational.