• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, March 10, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Advancing the frontier of video understanding with Gemini 2.5

Josh by Josh
June 10, 2025
in Google Marketing
0
Advancing the frontier of video understanding with Gemini 2.5


We recently launched two new models in our Gemini family: Gemini 2.5 Pro Preview (05/06) and Gemini 2.5 Flash (04/17). These models mark a major leap in video understanding. Gemini 2.5 Pro achieves state-of-the-art performance on key video understanding benchmarks, surpassing recent models like GPT 4.1 under comparable testing conditions (same prompt and video frames).

Furthermore, it rivals specialized fine-tuned models on several challenging benchmarks (e.g. YouCook2 dense captioning and QVHighlights moment retrieval). For cost-sensitive applications, Gemini 2.5 Flash provides a highly competitive alternative.

Advancing the frontier of video understanding with Gemini 2.5

Evaluation of Gemini 2.5 vs. prior models on video understanding benchmarks.
Performance is measured by string-match accuracy for multiple-choice VideoQA, LLM-based accuracy for EgoTempo, R1@0.5 for QVHighlights and CIDEr for YouCook2.
*Videos were processed at 1fps and linearly subsampled to a maximum of 256 frames, except for 1H-VideoQA (7200 frames).

Combining video and code with Gemini 2.5

Gemini 2.5 is the first time a natively multimodal model can use audio-visual information seamlessly with code and other data formats. To illustrate the power of Gemini 2.5’s video understanding capabilities, we showcase some of the use cases that we’ve been most excited about below.

Transforming videos into interactive applications

Gemini 2.5 Pro unlocks new possibilities for transforming videos into interactive applications. Video To Learning App, a Google AI Studio starter app, uses Gemini 2.5 to make learning from video content more effective and engaging.

First, the model sees a YouTube URL along with a text prompt that explains how it should analyze the video. Gemini 2.5 Pro analyzes the video and crafts a detailed spec for a learning application which reinforces key ideas in the video.

The generated spec is then sent directly back to Gemini 2.5 Pro to generate the code for the application, as illustrated in the vision correction simulator application below. Gemini 2.5 Flash can achieve similar results, offering a glimpse into novel video use cases in domains such as education and interactive content creation.

Creating animations from video with p5.js

Gemini 2.5 Pro unlocks exciting creative possibilities, such as the ability to generate dynamic animations from videos with a single prompt. This capability opens up new avenues for use cases such as automated content generation and creating accessible video summaries.

For example, when given our video on Project Astra along with the prompt ‘Create an animation in p5.js covering the different landmarks seen in this video.‘, Gemini 2.5 Pro analyzes the footage and produces a corresponding p5.js animation. The animation visualizes the landmarks identified by Gemini 2.5 Pro in the same temporal order as in the video.

Retrieving and describing moments from video

Gemini 2.5 Pro excels at identifying specific moments within videos using audio-visual cues with significantly higher accuracy than previous video processing systems. For example, in this 10-minute video of the Google Cloud Next ’25 opening keynote, it accurately identifies 16 distinct segments related to product presentations, using both audio and visual cues from the video to do so.

Temporal reasoning

With its advanced moment retrieval capabilities, Gemini 2.5 Pro is also able to solve nuanced temporal reasoning problems such as counting. In this example, Gemini successfully counts 17 distinct occurrences where the main character uses their phone in the project Astra video.

Building with Gemini 2.5 video understanding

Video understanding in Gemini 2.5 Flash and Pro are available in Google AI Studio, the Gemini API, and Vertex AI. Support for YouTube videos is available via the Gemini API and Google AI Studio, enabling anyone to build applications with access to billions of videos.

The Gemini API now offers a ‘low’ media resolution parameter enabling Gemini 2.5 Pro to process ~6 hours of video with 2 million token context. This provides for a more cost-effective setting with competitive video understanding performance (e.g., 84.7% vs 85.2% accuracy on VideoMME) for many long video understanding use cases.

READ ALSO

Unleash Your Development Superpowers: Refining the Core Coding Experience

How Google AI improved breast cancer detection in the UK

We are inspired by the innovative video applications already emerging from the community and can’t wait to see what you build!


Acknowledgements

A big shoutout to Aaron Wade for creating Video To Learning App and for the Vision Correction simulator example showcased in the blogpost.

We thank Sergi Caelles, Boyu Wang and Saarthak Khanna for their contributions on the evaluations presented above, Angeliki Lazaridou for inspiring some demo examples, and the entire Gemini video understanding team for the work culminating in this release. Finally, we would like to thank the video understanding leads Mario Lučić, Shuo-yiin Chang, and Paul Natsev, and overall multimodal understanding lead Jean-Baptiste Alayrac.



Source_link

Related Posts

Unleash Your Development Superpowers: Refining the Core Coding Experience
Google Marketing

Unleash Your Development Superpowers: Refining the Core Coding Experience

March 10, 2026
How Google AI improved breast cancer detection in the UK
Google Marketing

How Google AI improved breast cancer detection in the UK

March 10, 2026
Use Find Hub to find lost luggage with airline partnerships
Google Marketing

Use Find Hub to find lost luggage with airline partnerships

March 10, 2026
Introducing Wednesday Build Hour – Google Developers Blog
Google Marketing

Introducing Wednesday Build Hour – Google Developers Blog

March 10, 2026
Our statement on the Gavalas lawsuit
Google Marketing

Our statement on the Gavalas lawsuit

March 9, 2026
Drive with Star Trek on Waze
Google Marketing

Drive with Star Trek on Waze

March 9, 2026
Next Post
Pricing Packages | Flexible Options for Different Programs

Pricing Packages | Flexible Options for Different Programs

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Failure To Relevantly Differentiate: Amazon’s Grocery Blind Spot

Failure To Relevantly Differentiate: Amazon’s Grocery Blind Spot

February 3, 2026
How to Stop A Scroll With Your Content

How to Stop A Scroll With Your Content

August 13, 2025
Google finally lets Android users put Chrome’s address bar on the bottom

Google finally lets Android users put Chrome’s address bar on the bottom

June 25, 2025
10 CRM Software Features That G2 Users Value the Most

10 CRM Software Features That G2 Users Value the Most

January 30, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Build a Risk-Aware AI Agent with Internal Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Reliable Decision-Making
  • The Link Between Brand Vision And Enduring Profitable Growth
  • Fractional Executive Proposal Strategy — Close Deals Without Writing One
  • Unleash Your Development Superpowers: Refining the Core Coding Experience
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions