• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, July 3, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Hybrid AI model crafts smooth, high-quality videos in seconds | MIT News

Josh by Josh
June 24, 2025
in Al, Analytics and Automation
0
Hybrid AI model crafts smooth, high-quality videos in seconds | MIT News
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


What would a behind-the-scenes look at a video generated by an artificial intelligence model be like? You might think the process is similar to stop-motion animation, where many images are created and stitched together, but that’s not quite the case for “diffusion models” like OpenAl’s SORA and Google’s VEO 2.

Instead of producing a video frame-by-frame (or “autoregressively”), these systems process the entire sequence at once. The resulting clip is often photorealistic, but the process is slow and doesn’t allow for on-the-fly changes. 

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe Research have now developed a hybrid approach, called “CausVid,” to create videos in seconds. Much like a quick-witted student learning from a well-versed teacher, a full-sequence diffusion model trains an autoregressive system to swiftly predict the next frame while ensuring high quality and consistency. CausVid’s student model can then generate clips from a simple text prompt, turning a photo into a moving scene, extending a video, or altering its creations with new inputs mid-generation.

This dynamic tool enables fast, interactive content creation, cutting a 50-step process into just a few actions. It can craft many imaginative and artistic scenes, such as a paper airplane morphing into a swan, woolly mammoths venturing through snow, or a child jumping in a puddle. Users can also make an initial prompt, like “generate a man crossing the street,” and then make follow-up inputs to add new elements to the scene, like “he writes in his notebook when he gets to the opposite sidewalk.”

Brief computer-generated animation of a character in an old deep-sea diving suit walking on a leaf

A video produced by CausVid illustrates its ability to create smooth, high-quality content.

AI-generated animation courtesy of the researchers.

The CSAIL researchers say that the model could be used for different video editing tasks, like helping viewers understand a livestream in a different language by generating a video that syncs with an audio translation. It could also help render new content in a video game or quickly produce training simulations to teach robots new tasks.

Tianwei Yin SM ’25, PhD ’25, a recently graduated student in electrical engineering and computer science and CSAIL affiliate, attributes the model’s strength to its mixed approach.

“CausVid combines a pre-trained diffusion-based model with autoregressive architecture that’s typically found in text generation models,” says Yin, co-lead author of a new paper about the tool. “This AI-powered teacher model can envision future steps to train a frame-by-frame system to avoid making rendering errors.”

Yin’s co-lead author, Qiang Zhang, is a research scientist at xAI and a former CSAIL visiting researcher. They worked on the project with Adobe Research scientists Richard Zhang, Eli Shechtman, and Xun Huang, and two CSAIL principal investigators: MIT professors Bill Freeman and Frédo Durand.

Caus(Vid) and effect

Many autoregressive models can create a video that’s initially smooth, but the quality tends to drop off later in the sequence. A clip of a person running might seem lifelike at first, but their legs begin to flail in unnatural directions, indicating frame-to-frame inconsistencies (also called “error accumulation”).

Error-prone video generation was common in prior causal approaches, which learned to predict frames one by one on their own. CausVid instead uses a high-powered diffusion model to teach a simpler system its general video expertise, enabling it to create smooth visuals, but much faster.

Video thumbnail

Play video

CausVid enables fast, interactive video creation, cutting a 50-step process into just a few actions.

Video courtesy of the researchers.

CausVid displayed its video-making aptitude when researchers tested its ability to make high-resolution, 10-second-long videos. It outperformed baselines like “OpenSORA” and “MovieGen,” working up to 100 times faster than its competition while producing the most stable, high-quality clips.

Then, Yin and his colleagues tested CausVid’s ability to put out stable 30-second videos, where it also topped comparable models on quality and consistency. These results indicate that CausVid may eventually produce stable, hours-long videos, or even an indefinite duration.

A subsequent study revealed that users preferred the videos generated by CausVid’s student model over its diffusion-based teacher.

“The speed of the autoregressive model really makes a difference,” says Yin. “Its videos look just as good as the teacher’s ones, but with less time to produce, the trade-off is that its visuals are less diverse.”

CausVid also excelled when tested on over 900 prompts using a text-to-video dataset, receiving the top overall score of 84.27. It boasted the best metrics in categories like imaging quality and realistic human actions, eclipsing state-of-the-art video generation models like “Vchitect” and “Gen-3.”

While an efficient step forward in AI video generation, CausVid may soon be able to design visuals even faster — perhaps instantly — with a smaller causal architecture. Yin says that if the model is trained on domain-specific datasets, it will likely create higher-quality clips for robotics and gaming.

Experts say that this hybrid system is a promising upgrade from diffusion models, which are currently bogged down by processing speeds. “These models are way slower than LLMs [large language models] or generative image models,” says Carnegie Mellon University Assistant Professor Jun-Yan Zhu, who was not involved in the paper. “This new work changes that, making video generation much more efficient. That means better streaming speed, more interactive applications, and lower carbon footprints.”

The team’s work was supported, in part, by the Amazon Science Hub, the Gwangju Institute of Science and Technology, Adobe, Google, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator. CausVid will be presented at the Conference on Computer Vision and Pattern Recognition in June.



Source_link

READ ALSO

DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output

Confronting the AI/energy conundrum

Related Posts

DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output
Al, Analytics and Automation

DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output

July 3, 2025
Confronting the AI/energy conundrum
Al, Analytics and Automation

Confronting the AI/energy conundrum

July 3, 2025
Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters
Al, Analytics and Automation

Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters

July 2, 2025
Novel method detects microbial contamination in cell cultures | MIT News
Al, Analytics and Automation

Novel method detects microbial contamination in cell cultures | MIT News

July 2, 2025
Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework for Smarter Information Retrieval
Al, Analytics and Automation

Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework for Smarter Information Retrieval

July 2, 2025
Merging design and computer science in creative ways | MIT News
Al, Analytics and Automation

Merging design and computer science in creative ways | MIT News

July 1, 2025
Next Post
‘Sandbox first’: Andrew Ng’s blueprint for accelerating enterprise AI innovation

'Sandbox first’: Andrew Ng’s blueprint for accelerating enterprise AI innovation

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
Entries For The Elektra Awards 2025 Are Now Open!

Entries For The Elektra Awards 2025 Are Now Open!

May 30, 2025

EDITOR'S PICK

Even before the Xbox layoffs, there was ‘tension’ at Halo Studios

Even before the Xbox layoffs, there was ‘tension’ at Halo Studios

July 3, 2025

Boost Your Plumbing Business with Fridge Magnets: Must-Have Marketing

May 28, 2025
Top Image Annotation Companies 2025

Top Image Annotation Companies 2025

May 30, 2025
Brand Is Getting A “Rebrand” — Thank You, AI!

Brand Is Getting A “Rebrand” — Thank You, AI!

June 26, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Squid Game X Script (No Key, Auto Win, Glass Marker)
  • DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output
  • Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail
  • 24 Effective Ways to Drive Website Traffic in 2025 (Complete Guide)
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?