• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, May 24, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data

Josh by Josh
February 21, 2026
in Al, Analytics and Automation
0
NVIDIA Releases DreamDojo: An Open-Source Robot World Model Trained on 44,711 Hours of Real-World Human Video Data


Building simulators for robots has been a long term challenge. Traditional engines require manual coding of physics and perfect 3D models. NVIDIA is changing this with DreamDojo, a fully open-source, generalizable robot world model. Instead of using a physics engine, DreamDojo ‘dreams’ the results of robot actions directly in pixels.

https://arxiv.org/pdf/2602.06949

Scaling Robotics with 44k+ Hours of Human Experience

The biggest hurdle for AI in robotics is data. Collecting robot-specific data is expensive and slow. DreamDojo solves this by learning from 44k+ hours of egocentric human videos. This dataset, called DreamDojo-HV, is the largest of its kind for world model pretraining.

  • It features 6,015 unique tasks across 1M+ trajectories.
  • The data covers 9,869 unique scenes and 43,237 unique objects.
  • Pretraining used 100,000 NVIDIA H100 GPU hours to build 2B and 14B model variants.

Humans have already mastered complex physics, such as pouring liquids or folding clothes. DreamDojo uses this human data to give robots a ‘common sense’ understanding of how the world works.

https://arxiv.org/pdf/2602.06949

Bridging the Gap with Latent Actions

Human videos do not have robot motor commands. To make these videos ‘robot-readable,’ NVIDIA’s research team introduced continuous latent actions. This system uses a spatiotemporal Transformer VAE to extract actions directly from pixels.

  • The VAE encoder takes 2 consecutive frames and outputs a 32-dimensional latent vector.
  • This vector represents the most critical motion between frames.
  • The design creates an information bottleneck that disentangles action from visual context.
  • This allows the model to learn physics from humans and apply them to different robot bodies.
https://arxiv.org/pdf/2602.06949

Better Physics through Architecture

DreamDojo is based on the Cosmos-Predict2.5 latent video diffusion model. It uses the WAN2.2 tokenizer, which has a temporal compression ratio of 4. The team improved the architecture with 3 key features:

  1. Relative Actions: The model uses joint deltas instead of absolute poses. This makes it easier for the model to generalize across different trajectories.
  2. Chunked Action Injection: It injects 4 consecutive actions into each latent frame. This aligns the actions with the tokenizer’s compression ratio and fixes causality confusion.
  3. Temporal Consistency Loss: A new loss function matches predicted frame velocities to ground-truth transitions. This reduces visual artifacts and keeps objects physically consistent.

Distillation for 10.81 FPS Real-Time Interaction

A simulator is only useful if it is fast. Standard diffusion models require too many denoising steps for real-time use. NVIDIA team used a Self Forcing distillation pipeline to solve this.

  • The distillation training was conducted on 64 NVIDIA H100 GPUs.
  • The ‘student’ model reduces denoising from 35 steps down to 4 steps.
  • The final model achieves a real-time speed of 10.81 FPS.
  • It is stable for continuous rollouts of 60 seconds (600 frames).

Unlocking Downstream Applications

DreamDojo’s speed and accuracy enable several advanced applications for AI engineers.

1. Reliable Policy Evaluation

Testing robots in the real world is risky. DreamDojo acts as a high-fidelity simulator for benchmarking.

  • Its simulated success rates show a Pearson correlation of (Pearson 𝑟=0.995) with real-world results.
  • The Mean Maximum Rank Violation (MMRV) is only 0.003.

2. Model-Based Planning

Robots can use DreamDojo to ‘look ahead.’ A robot can simulate multiple action sequences and pick the best one.

  • In a fruit-packing task, this improved real-world success rates by 17%.
  • Compared to random sampling, it provided a 2x increase in success.

3. Live Teleoperation

Developers can teleoperate virtual robots in real time. NVIDIA team demonstrated this using a PICO VR controller and a local desktop with an NVIDIA RTX 5090. This allows for safe and rapid data collection.

Summary of Model Performance

Metric DREAMDOJO-2B DREAMDOJO-14B
Physics Correctness 62.50% 73.50%
Action Following 63.45% 72.55%
FPS (Distilled) 10.81 N/A

NVIDIA has released all weights, training code, and evaluation benchmarks. This open-source release allows you to post-train DreamDojo on your own robot data today.

Key Takeaways

  • Massive Scale and Diversity: DreamDojo is pretrained on DreamDojo-HV, the largest egocentric human video dataset to date, featuring 44,711 hours of footage across 6,015 unique tasks and 9,869 scenes.
  • Unified Latent Action Proxy: To overcome the lack of action labels in human videos, the model uses continuous latent actions extracted via a spatiotemporal Transformer VAE, which serves as a hardware-agnostic control interface.
  • Optimized Training and Architecture: The model achieves high-fidelity physics and precise controllability by utilizing relative action transformations, chunked action injection, and a specialized temporal consistency loss.
  • Real-Time Performance via Distillation: Through a Self Forcing distillation pipeline, the model is accelerated to 10.81 FPS, enabling interactive applications like live teleoperation and stable, long-horizon simulations for over 1 minute.
  • Reliable for Downstream Tasks: DreamDojo functions as an accurate simulator for policy evaluation, showing a 0.995 Pearson correlation with real-world success rates, and can improve real-world performance by 17% when used for model-based planning.

Check out the Paper and Codes. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

READ ALSO

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

Related Posts

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
Al, Analytics and Automation

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

May 24, 2026
Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
Al, Analytics and Automation

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

May 23, 2026
A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents
Al, Analytics and Automation

A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents

May 23, 2026
Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI Operator and Gemini 2.5 Computer Use on Online-Mind2Web
Al, Analytics and Automation

Microsoft Releases Fara1.5: A Family of Browser Computer-Use Agents (4B/9B/27B) That Outperform OpenAI Operator and Gemini 2.5 Computer Use on Online-Mind2Web

May 22, 2026
Justin Solomon appointed associate dean of engineering education | MIT News
Al, Analytics and Automation

Justin Solomon appointed associate dean of engineering education | MIT News

May 22, 2026
Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context Window
Al, Analytics and Automation

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context Window

May 21, 2026
Next Post
Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Data-Driven Corporate Communications: Measure What Matters

Data-Driven Corporate Communications: Measure What Matters

December 6, 2025
How to Get Post-mortem Badge in Secret Universe

How to Get Post-mortem Badge in Secret Universe

March 27, 2026
How to Clean a Beer Glass for Perfect Pours

How to Clean a Beer Glass for Perfect Pours

June 25, 2025
MIT in the media: 2025 in review | MIT News

MIT in the media: 2025 in review | MIT News

December 23, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • We’re partnering with U.S. Soccer to bring fans closer to the action with Search.
  • Ansel Adams’ Trust Says AI-Colorized Version Of His Work Was Exhibited Without Permission
  • Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
  • The 6 Best Microlearning Platforms I Recommend in 2026
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions