nxted
Nxted Capture

Robots can’t learn from the internet.
They learn from people.

LLMs had the whole internet to read. Robots have no internet of actions - the two largest open robot datasets combined add up to roughly 5,000 hours. Physical AI is bottlenecked on one thing: first-person video of real humans doing real, skilled work.

Nxted Capture records that data from India’s skilled workforce - at a fraction of Western cost, with the diversity the scaling laws reward.

Egocentric · first-person

Real hands. Real tools. Real workshops.

What is egocentric data for robotics?

Egocentric data for robotics is first-person video recorded from the worker’s own point of view as they perform a task - what a robot’s head- or wrist-mounted camera would see - plus depth, hand pose and 6-DoF motion. It teaches manipulation policies how a skilled human actually moves, which third-person or web video cannot.

Why capture this data in India?

India has one of the world’s largest skilled workforces across the trades and professions the scaling laws reward. nxted captures only verified, consented contributors in real workplaces, with fair pay and documented provenance. Skill depth and provenance - not cost - are the point, though Indian capture is also materially lower-cost.

The opportunity

The biggest bottleneck in robotics isn’t compute or models. It’s data of humans doing things.

$0B

Humanoid robot market by 2035

Goldman Sachs

$0M+

Spent yearly by robotics firms buying real-world data

MIT Technology Review

~5,000 hrs

The entire open-source robot dataset supply, combined

Scale AI

0%

Policy gain from human egocentric data vs robot data alone

Meta · EgoMimic

No internet of actions

Language models trained on trillions of web tokens. There is no equivalent corpus of physical manipulation - so robots have to be shown, demonstration by demonstration.

Diversity beats volume

Peer-reviewed scaling laws (ICLR 2025) show robot-policy generalisation follows a power law in the number of environments and objects - exactly what a diverse, India-wide workforce provides.

The money is already moving

Over $6B was invested in humanoid robotics in 2025. Robotics firms already spend $100M+ a year buying real-world data. The supply hasn’t caught up. That’s the gap.

How it works

From a skilled worker in a workshop to a training-ready dataset.

01

Source verified skilled workers

We recruit and verify tradespeople across India - tailors, machinists, carpenters, chefs, medical staff - matched to the skill and level you need.

02

Fit the capture rig

Workers wear egocentric (first-person) glasses built on research-standard devices, plus depth and hand-pose sensors for robotics-ready tiers.

03

Record real tasks, real environments

We film the actual job in the actual workshop - not a sterile lab. Diversity of scene and object is what the scaling laws reward.

04

Annotate & segment

Action segmentation, first-person narration, hand-pose tracks, 6DoF trajectory, and skill-level metadata - the Ego4D / Ego-Exo4D methodology.

05

Quality & consent review

Every batch is reviewed by our expert team. Signed releases, PII blurring, and a GDPR-compliant DPA cover every frame before delivery.

06

Deliver in your format

LeRobot, RLDS, HDF5, or MP4 + sidecars - dropped straight into your imitation-learning or VLA training stack.

The capture rig

Built on the devices the research field already trusts.

We don’t ship fictional proprietary hardware. Our rigs are built on Meta Project Aria, Intel RealSense, Stereolabs ZED, and the Universal Manipulation Interface - the same stack behind Ego-Exo4D and EgoMimic. Three tiers, matched to how training-ready you need the data to be.

First-person RGB
up to 1408² / 30fps
Depth + point cloud
RealSense / ZED
Hand pose
per-joint tracking
6DoF trajectory
SLAM-derived
Eye gaze
attention signal
Action labels
segmented + narrated

Entry

High-volume, low-cost

Sensors
  • Head-mounted 4K RGB camera
  • Built-in IMU (accel + gyro)
Outputs
  • 1080p / 30fps video
  • Motion telemetry
  • Action timestamps + narration
Formats
  • MP4 + JSON/CSV sidecars
  • Convertible to LeRobot

Pre-training video corpora; scaling across many workers fast

Built on UMI-style GoPro capture

Standard

Most popular

Research-grade egocentric

Sensors
  • Aria-class glasses: RGB + SLAM cameras
  • Eye tracking
  • Dual IMU + spatial audio
  • Optional fixed exo camera
Outputs
  • RGB up to 1408² / 30fps
  • 6DoF SLAM trajectory
  • Eye gaze + hand tracking
Formats
  • LeRobot (Parquet + MP4 + JSON)
  • MP4 + sidecars

The human→humanoid recipe - same glasses on worker and robot

Built on Meta Project Aria

Robotics-ready

Direct policy training

Sensors
  • Everything in Standard
  • RealSense / ZED depth
  • Hand-pose mocap gloves or UMI gripper
  • Multi-view exocentric
Outputs
  • RGB-D + point clouds
  • Per-joint hand pose
  • Gripper width / contact
  • 6DoF trajectory + gaze
Formats
  • LeRobot
  • RLDS / TFDS
  • HDF5 (ALOHA / robomimic)

Imitation learning & VLA training; Open X-Embodiment-ready

Built on Aria + RealSense + DexCap / UMI

The 5 levels

From basic packing to specialist craft.

See all levels
Level 01

Foundation

from $35/hr

Sorting, stacking, packing, basic assembly, labeling

Example task

Pack 50 items per minute across 5 product types

Level 02

Skilled Trades

from $55/hr

Tailoring, carpentry, cooking, cleaning, gardening

Example task

Complete garment assembly with hand and machine techniques

Level 03

Technical / Industrial

from $80/hr

CNC operation, welding, electrical work, plumbing, electronics assembly

Example task

Set up and operate a CNC lathe for precision part manufacture

Level 04

Professional / Medical

from $120/hr

Surgical assistance, patient care, pharmacy, dental, lab work

Example task

Prep and assist in laparoscopic procedure, instrument handling

Level 05

Rare / Specialist

Custom quote

Heritage crafts, traditional medicine, precision jewellery, instrument making

Example task

Traditional Kanjivaram silk weaving - 40-step process

Formats

We deliver in formats your robotics pipeline already uses.

MP4 4KRLDSHDF5LeRobotCustom
Every dataset includes
  • First-person (egocentric) perspective
  • Action timestamps and task segmentation
  • Worker skill level and experience metadata
  • Environment classification
  • Ethics consent documentation (GDPR-compliant)
Optional enrichment
+ Depth maps+ Pose skeletons+ Semantic segmentation+ Optical flow+ Hand tracking+ Action labels
Sample dataset

What actually lands in your bucket.

Every dataset ships in a consistent, robotics-ready structure - raw and processed video, multi-format episodes, full metadata, and the compliance pack. Below is the shape of a typical industrial delivery.

Example structureIllustrative
nxted_industrial_india_01/
  dataset_card.md          # scope, splits, limitations
  consent_manifest.csv     # per-contributor consent + pay
  task_cards/              # what each episode demonstrates
  raw_video/               # original egocentric capture
  processed_video/         # redacted, stabilised, trimmed
  exo_video/               # third-person reference angle
  metadata/                # calibration, 6DoF poses, timestamps
  annotations/             # action labels, segments, success
  quality_report.pdf       # inter-annotator agreement + QA
  lerobot/                 # LeRobot-format episodes
  rlds/                    # RLDS / TFDS shards
  hdf5/                    # HDF5 trajectories

Every dataset includes

  • Raw and processed egocentric video, redacted to spec (faces, plates, screens, PII).
  • Third-person reference video where a second angle helps the policy.
  • Per-episode task cards plus action labels, segments and success / failure flags.
  • LeRobot, RLDS and HDF5 exports of every episode - drop straight into your pipeline.
  • Full metadata: camera calibration, 6DoF trajectories, hand poses and timestamps.
  • Dataset card and data-provenance log tracing every clip back to its source.
  • Consent manifest and fair-payment record for every contributor.
  • QA report with inter-annotator agreement and labelled edge cases.

The consent, provenance, redaction and QA artifacts are bundled as the Data Trust Pack - the documentation your data, legal and safety teams need to sign off a dataset for production.

Proof it works

The field has already shown human capture trains better robots.

We don’t ask you to take our word for it. Here is what the leading labs and companies have published - every claim links to its source.

Demand

DoorDash pays couriers to film chores

In March 2026 DoorDash put dishwashing, laundry-folding and other household tasks in front of ~8M couriers - paid, body-worn capture sold to robotics partners. The gig economy is now the training layer for physical AI.

~8M couriers mobilisedPYMNTS
Validation

One hour of human data beats one hour of robot data

Georgia Tech's EgoMimic, built on Meta Project Aria glasses, trained a bimanual humanoid from ~90 minutes of first-person human video - a 400% jump over prior methods, and +34-228% over robot teleoperation data alone.

+400% from 90 minutesMeta AI
Scaling law

20,000+ hours → a scaling law for robot hands

NVIDIA EgoScale pretrained on 20,854 hours of egocentric human manipulation video and found a log-linear scaling law: every doubling of human video hours yields a predictable gain in task success.

20,854 hoursNVIDIA · arXiv
Method

The data pyramid: human video → synthetic → real

NVIDIA's Isaac GR00T N1 turned 88 hours of real teleoperation into 827 hours and generated 780,000 synthetic trajectories in 11 hours - lifting performance ~40% over real data alone. Human capture is the high-value apex.

+40% over real-onlyNVIDIA
Precedent

Apple EgoDex - 829 hours of dexterous hands

Apple recorded 829 hours of egocentric video with full 3D hand tracking across 194 tabletop tasks (90M frames) on Vision Pro - proof that first-person video of skilled hands is the data category that matters.

829 hrs · 194 tasksApple ML Research
In-the-wild

UMI: capture without a robot in the room

The Universal Manipulation Interface - a handheld gripper + camera worn by a human - produces hardware-agnostic policies that transfer zero-shot across robots. It is exactly the nxted Capture model: skilled humans, real environments.

Zero-shot transferStanford · UMI
Skill categories

What we can record.

Our flagship vertical is skilled industrial and technical work - electrical panel assembly, machine tending, CNC setup, electronics assembly, and inspection - the data the free open datasets don't cover and gig networks can't reach.

Electrical / Industrial assembly

Flagship vertical

Panel wiring · Electronics assembly · Machine tending · Inspection

CNC / Machining

Skilled machinists

Lathe ops · Mill setup · Quality inspection

Tailoring & Textile

Deep skilled pool

Hand stitching · Machine sewing · Pattern cutting

Construction / Warehouse

Heavy physical tasks

Bricklaying · Tile setting · Pick and pack

Carpentry & Furniture

Complex 3D assembly

Joinery · Lathe work · Lacquer finishing

Cooking & Food

Multi-cuisine pool

Knife skills · Plating · Tandoor / griddle

Medical & Surgical

Credentialed clinicians

Suturing · Instrument handoff · Patient prep

Why India

The skill depth of centuries of craft - captured with consent.

Skill

Real tradespeople doing real, skilled work - electrical assembly, machine tending, CNC setup, tailoring, surgical-adjacent tasks. The breadth of credentialed trades the scaling laws reward, and that Western lab-bound datasets lack.

Trust

Workers paid above local market rate, fully consented, with signed releases and PII/face/plate/screen redaction. Every frame is covered by a DPDP and GDPR-aligned DPA. Consent-first capture is the product, not an afterthought.

Provenance

Every dataset ships with a Data Trust Pack: consent records, skill verification, reviewer credentials, a dataset card, and a QA report. You always know who produced the data and how.

Cost

Indian capture is also materially lower-cost, but cost is the footnote, not the pitch.

Tell us the skill. We’ll capture the data your robots need.

Custom datasets quoted within 24 hours, in any format your pipeline already uses.

FAQ

Egocentric data for robotics: FAQ

What is egocentric data and why do robots need it?

Egocentric data is first-person video recorded from the doer’s point of view - what a robot’s own camera would see - usually with depth, hand pose and 6-DoF motion. Robots need it because there is no web-scale corpus of physical actions; manipulation policies have to be shown how skilled humans move, demonstration by demonstration.

What hardware does nxted capture with?

nxted builds on the rigs the research field already uses: Meta Project Aria glasses, Intel RealSense and Stereolabs ZED depth cameras, and Universal Manipulation Interface grippers - the same class of stack behind datasets like Ego-Exo4D. Three tiers map to how training-ready you need the data, from RGB-only up to full depth and hand pose.

What skill categories can nxted record?

The flagship vertical is skilled industrial and technical work - electrical panel assembly, machine tending, CNC setup, electronics assembly and inspection. nxted also captures tailoring and textile, construction and warehouse, carpentry, cooking, and medical-adjacent tasks, using credentialed contributors for the specialist categories.

What formats and annotations are included?

Each dataset ships in LeRobot, RLDS and HDF5, plus raw and processed egocentric video, an optional third-person reference angle, full metadata (camera calibration, 6-DoF trajectories, hand pose, timestamps), action segmentation and success/failure labels, a dataset card, a consent manifest and a QA report.

Is nxted’s capture consented and ethically sourced?

Yes. Contributors are paid above the local market rate, give explicit, withdrawable consent, and every filming site signs a release. Faces, plates, screens and PII are redacted, no contributor is under 18, and each delivery includes a Data Trust Pack with a DPDP & GDPR-aligned DPA.

How much does an egocentric dataset cost?

Start with a Physical AI Test Kit from $2,500 - 5 to 10 usable hours of one skilled task with a consent pack, metadata, basic labels and a LeRobot/RLDS/HDF5 sample, delivered in 7-10 days. Full datasets are priced per usable hour by skill level and quoted within 24 hours.