• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, May 5, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Technology And Software

Three ways AI is learning to understand the physical world

Josh by Josh
March 21, 2026
in Technology And Software
0
Three ways AI is learning to understand the physical world



Large language models are running into limits in domains that require an understanding of the physical world — from robotics to autonomous driving to manufacturing. That constraint is pushing investors toward world models, with AMI Labs raising a $1.03 billion seed round shortly after World Labs secured $1 billion.

READ ALSO

Telehealth Abortion Is Still Possible Without Mifepristone

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT

Large language models (LLMs) excel at processing abstract knowledge through next-token prediction, but they fundamentally lack grounding in physical causality. They cannot reliably predict the physical consequences of real-world actions. 

AI researchers and thought leaders are increasingly vocal about these limitations as the industry tries to push AI out of web browsers and into physical spaces. In an interview with podcaster Dwarkesh Patel, Turing Award recipient Richard Sutton warned that LLMs just mimic what people say instead of modeling the world, which limits their capacity to learn from experience and adjust themselves to changes in the world.

This is why models based on LLMs, including vision-language models (VLMs), can show brittle behavior and break with very small changes to their inputs. 

Google DeepMind CEO Demis Hassabis echoed this sentiment in another interview, pointing out that today's AI models suffer from “jagged intelligence.” They can solve complex math olympiads but fail at basic physics because they are missing critical capabilities regarding real-world dynamics. 

To solve this problem, researchers are shifting focus to building world models that act as internal simulators, allowing AI systems to safely test hypotheses before taking physical action. However, “world models” is an umbrella term that encompasses several distinct architectural approaches. 

That has produced three distinct architectural approaches, each with different tradeoffs.

JEPA: built for real-time

The first main approach focuses on learning latent representations instead of trying to predict the dynamics of the world at the pixel level. Endorsed by AMI Labs, this method is heavily based on the Joint Embedding Predictive Architecture (JEPA). 

JEPA models try to mimic how humans understand the world. When we observe the world, we do not memorize every single pixel or irrelevant detail in a scene. For example, if you watch a car driving down a street, you track its trajectory and speed; you do not calculate the exact reflection of light on every single leaf of the trees in the background. 

JEPA models reproduce this human cognitive shortcut. Instead of forcing the neural network to predict exactly what the next frame of a video will look like, the model learns a smaller set of abstract, or “latent,” features. It discards the irrelevant details and focuses entirely on the core rules of how elements in the scene interact. This makes the model robust against background noise and small changes that break other models.

This architecture is highly compute and memory efficient. By ignoring irrelevant details, it requires much fewer training examples and runs with significantly lower latency. These characteristics make it suitable for applications where efficiency and real-time inference are non-negotiable, such as robotics, self-driving cars, and high-stakes enterprise workflows. 

For example, AMI is partnering with healthcare company Nabla to use this architecture to simulate operational complexity and reduce cognitive load in fast-paced healthcare settings. 

Yann LeCun, a pioneer of the JEPA architecture and co-founder of AMI, explained that world models based on JEPA are designed to be "controllable in the sense that you can give them goals, and by construction, the only thing they can do is accomplish those goals" in an interview with Newsweek.

Gaussian splats: built for space

A second approach leans on generative models to build complete spatial environments from scratch. Adopted by companies like World Labs, this method takes an initial prompt (it could be an image or a textual description) and uses a generative model to create a 3D Gaussian splat. A Gaussian splat is a technique for representing 3D scenes using millions of tiny, mathematical particles that define geometry and lighting. Unlike flat video generation, these 3D representations can be imported directly into standard physics and 3D engines, such as Unreal Engine, where users and other AI agents can freely navigate and interact with them from any angle.

The primary benefit here is a drastic reduction in the time and one-time generation cost required to create complex interactive 3D environments. It addresses the exact problem outlined by World Labs founder Fei-Fei Li, who noted that LLMs are ultimately like “wordsmiths in the dark,” possessing flowery language but lacking spatial intelligence and physical experience. World Labs’ Marble model gives AI that missing spatial awareness. 

While this approach is not designed for split-second, real-time execution, it has massive potential for spatial computing, interactive entertainment, industrial design, and building static training environments for robotics. The enterprise value is evident in Autodesk’s heavy backing of World Labs to integrate these models into their industrial design applications.

End-to-end generation: built for scale

The third approach uses an end-to-end generative model to process prompts and user actions, continuously generating the scene, physical dynamics, and reactions on the fly. Rather than exporting a static 3D file to an external physics engine, the model itself acts as the engine. It ingests an initial prompt alongside a continuous stream of user actions, and it generates the subsequent frames of the environment in real-time, calculating physics, lighting, and object reactions natively. 

DeepMind’s Genie 3 and Nvidia’s Cosmos fall into this category. These models provide a highly simple interface for generating infinite interactive experiences and massive volumes of synthetic data. DeepMind demonstrated this natively with Genie 3, showcasing how the model maintains strict object permanence and consistent physics at 24 frames per second without relying on a separate memory module.

This approach translates directly into heavy-duty synthetic data factories. Nvidia Cosmos uses this architecture to scale synthetic data and physical AI reasoning, allowing autonomous vehicle and robotics developers to synthesize rare, dangerous edge-case conditions without the cost or risk of physical testing. Waymo (a fellow Alphabet subsidiary) built its world model on top of Genie 3, adapting it for training its self-driving cars.

The downside to this end-to-end generative method is the great compute cost required to continuously render physics and pixels simultaneously. Still, the investment is necessary to achieve the vision laid out by Hassabis, who argues that a deep, internal understanding of physical causality is required because current AI is missing critical capabilities to operate safely in the real world.

What comes next: hybrid architectures

LLMs will continue to serve as the reasoning and communication interface, but world models are positioning themselves as foundational infrastructure for physical and spatial data pipelines. As the underlying models mature, we are seeing the emergence of hybrid architectures that draw on the strengths of each approach. 

For example, cybersecurity startup DeepTempo recently developed LogLM, a model that integrates elements from LLMs and JEPA to detect anomalies and cyber threats from security and network logs. 



Source_link

Related Posts

Telehealth Abortion Is Still Possible Without Mifepristone
Technology And Software

Telehealth Abortion Is Still Possible Without Mifepristone

May 5, 2026
OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT
Technology And Software

OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT

May 5, 2026
OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers
Technology And Software

OpenAI turns its sold-out GPT-5.5 party into a monthlong Codex giveaway for 8,000 developers

May 5, 2026
Elon Musk Settles With The SEC For $1.5 Million After Years-Long Dispute Over His Twitter Investment
Technology And Software

Elon Musk Settles With The SEC For $1.5 Million After Years-Long Dispute Over His Twitter Investment

May 5, 2026
Greg Brockman Defends $30B OpenAI Stake: ‘Blood, Sweat, and Tears’
Technology And Software

Greg Brockman Defends $30B OpenAI Stake: ‘Blood, Sweat, and Tears’

May 5, 2026
Image AI models now drive app growth, beating chatbot upgrades
Technology And Software

Image AI models now drive app growth, beating chatbot upgrades

May 4, 2026
Next Post
Gemini task automation is slow, clunky, and super impressive

Gemini task automation is slow, clunky, and super impressive

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

AI for Energy welcomes 29 startups

AI for Energy welcomes 29 startups

September 3, 2025
How Google will show up for the Olympic and Paralympic Games

How Google will show up for the Olympic and Paralympic Games

October 9, 2025
AI Personalization Strategies Beauty Brands Use to Boost Sales

AI Personalization Strategies Beauty Brands Use to Boost Sales

February 9, 2026
How to Solve Level 46 in I’m Not a Robot

How to Solve Level 46 in I’m Not a Robot

September 22, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • What Is a Mobile Growth Agency? Services, Pricing, ROI
  • Meta Ads AI Connectors and Claude: Setup, Uses, and Risks
  • Telehealth Abortion Is Still Possible Without Mifepristone
  • Yellow.ai Launches Nexus Vox: The First Enterprise Voice AI That Can Clone Any Brand’s Voice and Deploy It Across 500+ Languages in Under a Second
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions