• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Sunday, July 13, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

This AI Paper Introduces PEVA: A Whole-Body Conditioned Diffusion Model for Predicting Egocentric Video from Human Motion

Josh by Josh
July 13, 2025
in Al, Analytics and Automation
0
This AI Paper Introduces PEVA: A Whole-Body Conditioned Diffusion Model for Predicting Egocentric Video from Human Motion
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Understanding the Link Between Body Movement and Visual Perception

The study of human visual perception through egocentric views is crucial in developing intelligent systems capable of understanding & interacting with their environment. This area emphasizes how movements of the human body—ranging from locomotion to arm manipulation—shape what is seen from a first-person perspective. Understanding this relationship is essential for enabling machines and robots to plan and act with a human-like sense of visual anticipation, particularly in real-world scenarios where visibility is dynamically influenced by physical motion.

Challenges in Modeling Physically Grounded Perception

A major hurdle in this domain arises from the challenge of teaching systems how body actions affect perception. Actions such as turning or bending change what is visible in subtle and often delayed ways. Capturing this requires more than simply predicting what comes next in a video—it involves linking physical movements to the resulting changes in visual input. Without the ability to interpret and simulate these changes, embodied agents struggle to plan or interact effectively in dynamic environments.

READ ALSO

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior

New AI system uncovers hidden cell subtypes, boosts precision medicine | MIT News

Limitations of Prior Models and the Need for Physical Grounding

Until now, tools designed to predict video from human actions have been limited in scope. Models have often used low-dimensional input, such as velocity or head direction, and overlooked the complexity of whole-body motion. These simplified approaches overlook the fine-grained control and coordination required to simulate human actions accurately. Even in video generation models, body motion has usually been treated as the output rather than the driver of prediction. This lack of physical grounding has restricted the usefulness of these models for real-world planning.

Introducing PEVA: Predicting Egocentric Video from Action

Researchers from UC Berkeley, Meta’s FAIR, and New York University introduced a new framework called PEVA to overcome these limitations. The model predicts future egocentric video frames based on structured full-body motion data, derived from 3D body pose trajectories. PEVA aims to demonstrate how entire-body movements influence what a person sees, thereby grounding the connection between action and perception. The researchers employed a conditional diffusion transformer to learn this mapping and trained it using Nymeria, a large dataset comprising real-world egocentric videos synchronized with full-body motion capture.

Structured Action Representation and Model Architecture

The foundation of PEVA lies in its ability to represent actions in a highly structured manner. Each action input is a 48-dimensional vector that includes the root translation and joint-level rotations across 15 upper body joints in 3D space. This vector is normalized and transformed into a local coordinate frame centered at the pelvis to remove any positional bias. By utilizing this comprehensive representation of body dynamics, the model captures the continuous and nuanced nature of real motion. PEVA is designed as an autoregressive diffusion model that uses a video encoder to convert frames into latent state representations and predicts subsequent frames based on prior states and body actions. To support long-term video generation, the system introduces random time-skips during training, allowing it to learn from both immediate and delayed visual consequences of motion.

Performance Evaluation and Results

In terms of performance, PEVA was evaluated on several metrics that test both short-term and long-term video prediction capabilities. The model was able to generate visually consistent and semantically accurate video frames over extended periods of time. For short-term predictions, evaluated at 2-second intervals, it achieved lower LPIPS scores and higher DreamSim consistency compared to baselines, indicating superior perceptual quality. The system also decomposed human movement into atomic actions such as arm movements and body rotations to assess fine-grained control. Furthermore, the model was tested on extended rollouts of up to 16 seconds, successfully simulating delayed outcomes while maintaining sequence coherence. These experiments confirmed that incorporating full-body control led to substantial improvements in video realism and controllability.

Conclusion: Toward Physically Grounded Embodied Intelligence

This research highlights a significant advancement in predicting future egocentric video by grounding the model in physical human movement. The problem of linking whole-body action to visual outcomes is addressed with a technically robust method that uses structured pose representations and diffusion-based learning. The solution introduced by the team offers a promising direction for embodied AI systems that require accurate, physically grounded foresight.


Check out the Paper here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter, and Youtube and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.



Source_link

Related Posts

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior
Al, Analytics and Automation

Moonshot AI Releases Kimi K2: A Trillion-Parameter MoE Model Focused on Long Context, Code, Reasoning, and Agentic Behavior

July 12, 2025
New AI system uncovers hidden cell subtypes, boosts precision medicine | MIT News
Al, Analytics and Automation

New AI system uncovers hidden cell subtypes, boosts precision medicine | MIT News

July 12, 2025
From Perception to Action: The Role of World Models in Embodied AI Systems
Al, Analytics and Automation

From Perception to Action: The Role of World Models in Embodied AI Systems

July 12, 2025
Mistral AI Releases Devstral 2507 for Code-Centric Language Modeling
Al, Analytics and Automation

Mistral AI Releases Devstral 2507 for Code-Centric Language Modeling

July 11, 2025
Medical Image Annotation and Labeling Services Guide 2025
Al, Analytics and Automation

Medical Image Annotation and Labeling Services Guide 2025

July 11, 2025
AI shapes autonomous underwater “gliders” | MIT News
Al, Analytics and Automation

AI shapes autonomous underwater “gliders” | MIT News

July 11, 2025
Next Post
AI-Powered Thought Organization for Marketers

AI-Powered Thought Organization for Marketers

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025

EDITOR'S PICK

Brand, Acquire, Retain: Unlocking Full-Funnel UA Success with AppSamurai DSP June 2025 (Updated)

Brand, Acquire, Retain: Unlocking Full-Funnel UA Success with AppSamurai DSP June 2025 (Updated)

June 3, 2025
Gestion du temps commercial : optimiser sa journée commerciale

Gestion du temps commercial : optimiser sa journée commerciale

June 14, 2025
2025: The Year of Publisher Profitability – Seize The First Mover’s Advantage

2025: The Year of Publisher Profitability – Seize The First Mover’s Advantage

June 5, 2025
How to Build a Website (Start to Finish, With Walkthroughs)

How to Build a Website (Start to Finish, With Walkthroughs)

July 2, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Merge Google Business Profiles (and When You Shouldn’t)
  • How an Edmonton Car Accident Lawyer Can Maximize Your Injury Compensation
  • Cloud Infrastructure Security in 2025: Trends,Tools and Threats
  • I Evaluated G2’s 7 Best Conversation Intelligence Software
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?