Why egocentric data is the bottleneck for humanoid robots
The internet is not a substitute for first-person human demonstration. Here's the maths.
The bottleneck
Every humanoid robot today is gated on one thing: the supply of first-person human demonstration data.
The internet is not a substitute. YouTube footage is third-person. Wikipedia is text. Reddit is opinions. Robot training requires how a skilled human moves through a task - joint angles, gaze, contact forces - recorded from the perspective the robot will occupy.
The maths
NVIDIA's recent Embodied AI study found a 54% improvement in manipulation accuracy moving from web-scraped to egocentric data. Apple Vision Pro researchers needed 829 hours of shoelace-tying footage to teach a robot a single dexterous task.
If you extrapolate that ratio to the 5,000+ tasks a general-purpose humanoid needs, you arrive at 1.5 million+ hours of egocentric video. The supply does not exist.
Why India
India has one of the world's largest skilled workforces, broad English fluency in the working population, and a materially lower cost base than the US. It is one of the few places where this scale of capture is economically rational.
Physical-AI data specialists at OFORO LTD (UK). We write about egocentric data, robotics dataset formats, RLHF and data governance. See what we build.