• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, July 3, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Advancing the frontier of video understanding with Gemini 2.5

Josh by Josh
June 10, 2025
in Google Marketing
0
Advancing the frontier of video understanding with Gemini 2.5
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


We recently launched two new models in our Gemini family: Gemini 2.5 Pro Preview (05/06) and Gemini 2.5 Flash (04/17). These models mark a major leap in video understanding. Gemini 2.5 Pro achieves state-of-the-art performance on key video understanding benchmarks, surpassing recent models like GPT 4.1 under comparable testing conditions (same prompt and video frames).

Furthermore, it rivals specialized fine-tuned models on several challenging benchmarks (e.g. YouCook2 dense captioning and QVHighlights moment retrieval). For cost-sensitive applications, Gemini 2.5 Flash provides a highly competitive alternative.

Advancing the frontier of video understanding with Gemini 2.5

Evaluation of Gemini 2.5 vs. prior models on video understanding benchmarks.
Performance is measured by string-match accuracy for multiple-choice VideoQA, LLM-based accuracy for EgoTempo, R1@0.5 for QVHighlights and CIDEr for YouCook2.
*Videos were processed at 1fps and linearly subsampled to a maximum of 256 frames, except for 1H-VideoQA (7200 frames).

Combining video and code with Gemini 2.5

Gemini 2.5 is the first time a natively multimodal model can use audio-visual information seamlessly with code and other data formats. To illustrate the power of Gemini 2.5’s video understanding capabilities, we showcase some of the use cases that we’ve been most excited about below.

Transforming videos into interactive applications

Gemini 2.5 Pro unlocks new possibilities for transforming videos into interactive applications. Video To Learning App, a Google AI Studio starter app, uses Gemini 2.5 to make learning from video content more effective and engaging.

First, the model sees a YouTube URL along with a text prompt that explains how it should analyze the video. Gemini 2.5 Pro analyzes the video and crafts a detailed spec for a learning application which reinforces key ideas in the video.

The generated spec is then sent directly back to Gemini 2.5 Pro to generate the code for the application, as illustrated in the vision correction simulator application below. Gemini 2.5 Flash can achieve similar results, offering a glimpse into novel video use cases in domains such as education and interactive content creation.

Creating animations from video with p5.js

Gemini 2.5 Pro unlocks exciting creative possibilities, such as the ability to generate dynamic animations from videos with a single prompt. This capability opens up new avenues for use cases such as automated content generation and creating accessible video summaries.

For example, when given our video on Project Astra along with the prompt ‘Create an animation in p5.js covering the different landmarks seen in this video.‘, Gemini 2.5 Pro analyzes the footage and produces a corresponding p5.js animation. The animation visualizes the landmarks identified by Gemini 2.5 Pro in the same temporal order as in the video.

Retrieving and describing moments from video

Gemini 2.5 Pro excels at identifying specific moments within videos using audio-visual cues with significantly higher accuracy than previous video processing systems. For example, in this 10-minute video of the Google Cloud Next ’25 opening keynote, it accurately identifies 16 distinct segments related to product presentations, using both audio and visual cues from the video to do so.

Temporal reasoning

With its advanced moment retrieval capabilities, Gemini 2.5 Pro is also able to solve nuanced temporal reasoning problems such as counting. In this example, Gemini successfully counts 17 distinct occurrences where the main character uses their phone in the project Astra video.

Building with Gemini 2.5 video understanding

Video understanding in Gemini 2.5 Flash and Pro are available in Google AI Studio, the Gemini API, and Vertex AI. Support for YouTube videos is available via the Gemini API and Google AI Studio, enabling anyone to build applications with access to billions of videos.

The Gemini API now offers a ‘low’ media resolution parameter enabling Gemini 2.5 Pro to process ~6 hours of video with 2 million token context. This provides for a more cost-effective setting with competitive video understanding performance (e.g., 84.7% vs 85.2% accuracy on VideoMME) for many long video understanding use cases.

READ ALSO

Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail

No-cost AI tools that amplify teaching and learning

We are inspired by the innovative video applications already emerging from the community and can’t wait to see what you build!


Acknowledgements

A big shoutout to Aaron Wade for creating Video To Learning App and for the Vision Correction simulator example showcased in the blogpost.

We thank Sergi Caelles, Boyu Wang and Saarthak Khanna for their contributions on the evaluations presented above, Angeliki Lazaridou for inspiring some demo examples, and the entire Gemini video understanding team for the work culminating in this release. Finally, we would like to thank the video understanding leads Mario Lučić, Shuo-yiin Chang, and Paul Natsev, and overall multimodal understanding lead Jean-Baptiste Alayrac.



Source_link

Related Posts

Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail
Google Marketing

Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail

July 3, 2025
No-cost AI tools that amplify teaching and learning
Google Marketing

No-cost AI tools that amplify teaching and learning

July 3, 2025
The 7 best smartwatches for Android in 2025
Google Marketing

The 7 best smartwatches for Android in 2025

July 3, 2025
Google AI announcements from June
Google Marketing

Google AI announcements from June

July 2, 2025
Google ordered to pay $314 million for ‘unavoidable’ data transfers
Google Marketing

Google ordered to pay $314 million for ‘unavoidable’ data transfers

July 2, 2025
Google makes it easier to let friends and kids control your smart home
Google Marketing

Google makes it easier to let friends and kids control your smart home

July 2, 2025
Next Post
Pricing Packages | Flexible Options for Different Programs

Pricing Packages | Flexible Options for Different Programs

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
Entries For The Elektra Awards 2025 Are Now Open!

Entries For The Elektra Awards 2025 Are Now Open!

May 30, 2025

EDITOR'S PICK

Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications

Mistral AI Releases Magistral Series: Advanced Chain-of-Thought LLMs for Enterprise and Open-Source Applications

June 11, 2025
14 Sponsorship and Media Options for Agencies and Company Partners

14 Sponsorship and Media Options for Agencies and Company Partners

June 6, 2025
Qwen Researchers Proposes QwenLong-L1: A Reinforcement Learning Framework for Long-Context Reasoning in Large Language Models

Qwen Researchers Proposes QwenLong-L1: A Reinforcement Learning Framework for Long-Context Reasoning in Large Language Models

May 27, 2025
Silverpush Launches Pre-Bid Brand Safety & Contextual CTV Segments

Silverpush Launches Pre-Bid Brand Safety & Contextual CTV Segments

June 8, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • DeepSeek R1T2 Chimera: 200% Faster Than R1-0528 With Improved Reasoning and Compact Output
  • Google’s customizable Gemini chatbots are now in Docs, Sheets, and Gmail
  • 24 Effective Ways to Drive Website Traffic in 2025 (Complete Guide)
  • NSPRA President Heidi Vega on courageous leadership in times of crisis
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?