• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, August 23, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Advancing the frontier of video understanding with Gemini 2.5

Josh by Josh
June 10, 2025
in Google Marketing
0
Advancing the frontier of video understanding with Gemini 2.5
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


We recently launched two new models in our Gemini family: Gemini 2.5 Pro Preview (05/06) and Gemini 2.5 Flash (04/17). These models mark a major leap in video understanding. Gemini 2.5 Pro achieves state-of-the-art performance on key video understanding benchmarks, surpassing recent models like GPT 4.1 under comparable testing conditions (same prompt and video frames).

Furthermore, it rivals specialized fine-tuned models on several challenging benchmarks (e.g. YouCook2 dense captioning and QVHighlights moment retrieval). For cost-sensitive applications, Gemini 2.5 Flash provides a highly competitive alternative.

Advancing the frontier of video understanding with Gemini 2.5

Evaluation of Gemini 2.5 vs. prior models on video understanding benchmarks.
Performance is measured by string-match accuracy for multiple-choice VideoQA, LLM-based accuracy for EgoTempo, R1@0.5 for QVHighlights and CIDEr for YouCook2.
*Videos were processed at 1fps and linearly subsampled to a maximum of 256 frames, except for 1H-VideoQA (7200 frames).

Combining video and code with Gemini 2.5

Gemini 2.5 is the first time a natively multimodal model can use audio-visual information seamlessly with code and other data formats. To illustrate the power of Gemini 2.5’s video understanding capabilities, we showcase some of the use cases that we’ve been most excited about below.

Transforming videos into interactive applications

Gemini 2.5 Pro unlocks new possibilities for transforming videos into interactive applications. Video To Learning App, a Google AI Studio starter app, uses Gemini 2.5 to make learning from video content more effective and engaging.

First, the model sees a YouTube URL along with a text prompt that explains how it should analyze the video. Gemini 2.5 Pro analyzes the video and crafts a detailed spec for a learning application which reinforces key ideas in the video.

The generated spec is then sent directly back to Gemini 2.5 Pro to generate the code for the application, as illustrated in the vision correction simulator application below. Gemini 2.5 Flash can achieve similar results, offering a glimpse into novel video use cases in domains such as education and interactive content creation.

Creating animations from video with p5.js

Gemini 2.5 Pro unlocks exciting creative possibilities, such as the ability to generate dynamic animations from videos with a single prompt. This capability opens up new avenues for use cases such as automated content generation and creating accessible video summaries.

For example, when given our video on Project Astra along with the prompt ‘Create an animation in p5.js covering the different landmarks seen in this video.‘, Gemini 2.5 Pro analyzes the footage and produces a corresponding p5.js animation. The animation visualizes the landmarks identified by Gemini 2.5 Pro in the same temporal order as in the video.

Retrieving and describing moments from video

Gemini 2.5 Pro excels at identifying specific moments within videos using audio-visual cues with significantly higher accuracy than previous video processing systems. For example, in this 10-minute video of the Google Cloud Next ’25 opening keynote, it accurately identifies 16 distinct segments related to product presentations, using both audio and visual cues from the video to do so.

Temporal reasoning

With its advanced moment retrieval capabilities, Gemini 2.5 Pro is also able to solve nuanced temporal reasoning problems such as counting. In this example, Gemini successfully counts 17 distinct occurrences where the main character uses their phone in the project Astra video.

Building with Gemini 2.5 video understanding

Video understanding in Gemini 2.5 Flash and Pro are available in Google AI Studio, the Gemini API, and Vertex AI. Support for YouTube videos is available via the Gemini API and Google AI Studio, enabling anyone to build applications with access to billions of videos.

The Gemini API now offers a ‘low’ media resolution parameter enabling Gemini 2.5 Pro to process ~6 hours of video with 2 million token context. This provides for a more cost-effective setting with competitive video understanding performance (e.g., 84.7% vs 85.2% accuracy on VideoMME) for many long video understanding use cases.

READ ALSO

Our approach to energy innovation and AI’s environmental footprint

Google’s first Gemini smart home speaker detailed in leak

We are inspired by the innovative video applications already emerging from the community and can’t wait to see what you build!


Acknowledgements

A big shoutout to Aaron Wade for creating Video To Learning App and for the Vision Correction simulator example showcased in the blogpost.

We thank Sergi Caelles, Boyu Wang and Saarthak Khanna for their contributions on the evaluations presented above, Angeliki Lazaridou for inspiring some demo examples, and the entire Gemini video understanding team for the work culminating in this release. Finally, we would like to thank the video understanding leads Mario Lučić, Shuo-yiin Chang, and Paul Natsev, and overall multimodal understanding lead Jean-Baptiste Alayrac.



Source_link

Related Posts

Our approach to energy innovation and AI’s environmental footprint
Google Marketing

Our approach to energy innovation and AI’s environmental footprint

August 23, 2025
Google’s first Gemini smart home speaker detailed in leak
Google Marketing

Google’s first Gemini smart home speaker detailed in leak

August 23, 2025
Ruth Porat on AI and its applications in finance
Google Marketing

Ruth Porat on AI and its applications in finance

August 23, 2025
Google’s AI-stuffed Pixel 10 event
Google Marketing

Google’s AI-stuffed Pixel 10 event

August 22, 2025
New Gemini feature and model updates for Pixels, smartphones
Google Marketing

New Gemini feature and model updates for Pixels, smartphones

August 22, 2025
Google made it easier to edit your Drive videos
Google Marketing

Google made it easier to edit your Drive videos

August 22, 2025
Next Post
Pricing Packages | Flexible Options for Different Programs

Pricing Packages | Flexible Options for Different Programs

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Refreshing a Legacy Brand for a Meaningful Future – Truly Deeply – Brand Strategy & Creative Agency Melbourne

Refreshing a Legacy Brand for a Meaningful Future – Truly Deeply – Brand Strategy & Creative Agency Melbourne

June 7, 2025

EDITOR'S PICK

How Do I Identify The Right Influencers For My Brand?

How Do I Identify The Right Influencers For My Brand?

July 8, 2025
North American Program Owners Focus on Enhancing Loyalty and Customer Engagement

North American Program Owners Focus on Enhancing Loyalty and Customer Engagement

May 31, 2025

Digital Turbine Joins The Coalition for a Competitive Mobile Experience

August 5, 2025
I Reviewed Top 6 Generative AI Infrastructure Tools of 2025

I Reviewed Top 6 Generative AI Infrastructure Tools of 2025

July 19, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • US Masters Swimming’s Daniel Paulling’s one underrated digital storytelling tactic
  • Grow a Garden Warped Mutation Multiplier
  • Dailymotion Advertising Introduces EchoAI: The Conversational Ad Format Powered by AI
  • I Tested Mydreamcompanion Video Generator for 1 Month
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?