• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, June 23, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling

Josh by Josh
March 24, 2026
in Al, Analytics and Automation
0
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling


World Models (WMs) are a central framework for developing agents that reason and plan in a compact latent space. However, training these models directly from pixel data often leads to ‘representation collapse,’ where the model produces redundant embeddings to trivially satisfy prediction objectives. Current approaches attempt to prevent this by relying on complex heuristics: they utilize stop-gradient updates, exponential moving averages (EMA), and frozen pre-trained encoders. A team of researchers including Yann LeCun and many others (Mila & Université de Montréal, New York University, Samsung SAIL and Brown University) introduced LeWorldModel (LeWM), the first JEPA (Joint-Embedding Predictive Architecture) that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings

Technical Architecture and Objective

LeWM consists of two primary components learned jointly: an Encoder and a Predictor.

  • Encoder ((zt=encθ (ot)): Maps a raw pixel observation into a compact, low-dimensional latent representation. The implementation uses a ViT-Tiny architecture (~5M parameters).
  • Predictor (Žt+1=predθ(zt, at)): A transformer (~10M parameters) that models environment dynamics by predicting future latent states conditioned on actions.

The model is optimized using a streamlined objective function consisting of only two loss terms:

$$\mathcal{L}_{LeWM} \triangleq \mathcal{L}_{pred} + \lambda SIGReg(Z)$$

The prediction loss (Lpred) computes the mean-squared error (MSE) between the predicted and actual consecutive embeddings. The SIGReg (Sketched-Isotropic-Gaussian Regularizer) is the anti-collapse term that enforces feature diversity.

As per the research paper, applying a dropout rate of 0.1 in the predictor and a specific projection step (1-layer MLP with Batch Normalization) after the encoder are critical for stability and downstream performance.

Efficiency via SIGReg and Sparse Tokenization

Assessing normality in high-dimensional latent spaces is a major scaling challenge. LeWM addresses this using SIGReg, which leverages the Cramér-Wold theorem: a multivariate distribution matches a target (isotropic Gaussian) if all its one-dimensional projections match that target.

SIGReg projects latent embeddings onto M random directions and applies the Epps-Pulley test statistic to each resulting one-dimensional projection. Because the regularization weight λ is the only effective hyperparameter to tune, researchers can optimize it using a bisection search with O(log n) complexity, a significant improvement over the polynomial-time search (O(n6)) required by previous models like PLDM.

Speed Benchmarks

In the reported setup, LeWM demonstrates high computational efficiency:

  • Token Efficiency: LeWM encodes observations using ~200× fewer tokens than DINO-WM.
  • Planning Speed: LeWM achieves planning up to 48× faster than DINO-WM (0.98s vs 47s per planning cycle).

Latent Space Properties and Physical Understanding

LeWM’s latent space supports probing of physical quantities and detection of physically implausible events.

Violation-of-Expectation (VoE)

Using a VoE framework, the model was evaluated on its ability to detect ‘surprise’. It assigned higher surprise to physical perturbations such as teleportation; visual perturbations produced weaker effects, and cube color changes in OGBench-Cube were not significant.

Emergent Path Straightening

LeWM exhibits Temporal Latent Path Straightening, where latent trajectories naturally become smoother and more linear over the course of training. Notably, LeWM achieves higher temporal straightness than PLDM despite having no explicit regularizer encouraging this behavior.

Feature LeWorldModel (LeWM) PLDM DINO-WM Dreamer / TD-MPC
Training Paradigm Stable End-to-End End-to-End Frozen Foundation Encoder Task-Specific
Input Type Raw Pixels Raw Pixels Pixels (DINOv2 features) Rewards / Privileged State
Loss Terms 2 (Prediction + SIGReg) 7 (VICReg-based) 1 (MSE on latents) Multiple (Task-specific)
Tunable Hyperparams 1 (Effective weight λ) 6 N/A (Fixed by pre-training) Many (Task-dependent)
Planning Speed Up to 48x Faster Fast (Compact latents) Slow (~50x slower than LeWM) Varies (often slow generation)
Anti-Collapse Provable (Gaussian prior) Under-specified / Unstable Bounded by pre-training Heuristic (e.g., reconstruction)
Requirement Task-Agnostic / Reward-Free Task-Agnostic / Reward-Free Frozen Pre-trained Encoder Task Signals / Rewards

Key Takeaways

  • Stable End-to-End Learning: LeWM is the first Joint-Embedding Predictive Architecture (JEPA) that trains stably end-to-end from raw pixels without needing ‘hand-holding’ heuristics like stop-gradients, exponential moving averages (EMA), or frozen pre-trained encoders.
  • A Radical Two-Term Objective: The training process is simplified into just two loss terms—a next-embedding prediction loss and the SIGReg regularizer—reducing the number of tunable hyperparameters from six to one compared to existing end-to-end alternatives.
  • Built for Real-Time Speed: By representing observations with approximately 200× fewer tokens than foundation-model-based counterparts, LeWM plans up to 48× faster, completing full trajectory optimizations in under one second.
  • Provable Anti-Collapse: To prevent the model from learning ‘garbage’ redundant representations, it uses the SIGReg regularizer; this utilizes the Cramér-Wold theorem to ensure high-dimensional latent embeddings stay diverse and Gaussian-distributed.
  • Intrinsic Physical Logic: The model doesn’t just predict data; it captures meaningful physical structure in its latent space, allowing it to accurately probe physical quantities and detect ‘impossible’ events like object teleportation through a violation-of-expectation framework.

Check out the Paper, Website and Repo. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


READ ALSO

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs



Source_link

Related Posts

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval
Al, Analytics and Automation

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

June 23, 2026
Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs
Al, Analytics and Automation

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

June 22, 2026
How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export
Al, Analytics and Automation

How to Design Python-First Interactive Dashboards with Prefab Reactive UI Components and Static HTML Export

June 22, 2026
Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration
Al, Analytics and Automation

Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

June 21, 2026
Crawlee for Python: Build a Web Crawling Pipeline with Robots Handling, Link Graphs, and RAG Chunk Export
Al, Analytics and Automation

Crawlee for Python: Build a Web Crawling Pipeline with Robots Handling, Link Graphs, and RAG Chunk Export

June 21, 2026
Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed
Al, Analytics and Automation

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

June 20, 2026
Next Post
What is DeerFlow 2.0 and what should enterprises know about this new, powerful local AI agent orchestrator?

What is DeerFlow 2.0 and what should enterprises know about this new, powerful local AI agent orchestrator?

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

How to Rank in Google’s AI Overviews: 7 Pro Tips

How to Rank in Google’s AI Overviews: 7 Pro Tips

August 23, 2025
Core Differences & How to Win Visibility in Both

Core Differences & How to Win Visibility in Both

September 29, 2025
Brand Bias in Prompts: An Experiment

Brand Bias in Prompts: An Experiment

April 7, 2026
Digital artist Beeple put his face on a $100K robot dog next to Elon Musk and Picasso – it sold first

Digital artist Beeple put his face on a $100K robot dog next to Elon Musk and Picasso – it sold first

December 6, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Create Interactive In-App Templates With Merlin AI In-App Template Generator
  • Israel Ranks #1 in the World for AI Adoption Per Capita; 95% of Israeli Tech Workers Now Use AI Daily, Joint 5WPR-Louder Study Finds
  • 150+ Instagram username ideas for 2026 (+ AI generator)
  • The Antichrist and Trump: An old evangelical Christian idea is politics now.
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions