• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, July 11, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Announcing GenAI Processors: Build powerful and flexible Gemini applications

Josh by Josh
July 11, 2025
in Google Marketing
0
Announcing GenAI Processors: Build powerful and flexible Gemini applications
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Building sophisticated AI applications with Large Language Models (LLMs), especially those handling multimodal input and requiring real-time responsiveness, often feels like assembling a complex puzzle: you’re stitching together diverse data processing steps, asynchronous API calls, and custom logic. As complexity grows, this can lead to brittle, hard-to-maintain code.

Today, we’re introducing GenAI Processors, a new open-source Python library from Google DeepMind designed to bring structure and simplicity to these challenges. GenAI Processors provides an abstraction layer, defining a consistent Processor interface for everything from input handling and pre-processing to model calls and output processing.

At its core, GenAI Processors treat all input and output as asynchronous streams of ProcessorParts (i.e. two-way aka bidirectional streaming). Think of it as standardized data parts (e.g., a chunk of audio, a text transcription, an image frame) flowing through your pipeline along with associated metadata. This stream-based API allows for seamless chaining and composition of different operations, from low-level data manipulation to high-level model calls.

GenAI Processors library

The GenAI Processors library is designed to optimize the concurrent execution of a Processor. Any part in this example of execution flow can be generated concurrently when all its ancestors in the graph are computed, e.g. `c’12` can be generated concurrently to `a’1`. The flow maintains the ordering of the output stream with respect to the input stream and will be executed to minimize Time To First Token (prefer `a12` to `d12` whenever possible). This concurrency optimization is done under the hood: applying a Processor to a stream of input will automatically trigger this concurrent execution whenever possible.

For example, you can easily build a “Live Agent” capable of processing audio and video streams in real-time using the Gemini Live API with just a few lines of code. In the following example, notice how input sources and processing steps are combined using the + operator, creating a clear data flow (full code on GitHub):

from genai_processors.core import audio_io, live_model, video

# Input processor: combines camera streams and audio streams
input_processor = video.VideoIn() + audio_io.PyAudioIn(...)

# Output processor: plays the audio parts. Handles interruptions and pauses
# audio output when the user is speaking.
play_output = audio_io.PyAudioOut(...)

# Gemini Live API processor
live_processor = live_model.LiveProcessor(...)

# Compose the agent: mic+camera -> Gemini Live API -> play audio
live_processor = live_model.LiveProcessor(...)
live_agent = input_processor + live_processor + play_output

async for part in live_agent(streams.endless_stream()):
  # Process the output parts (e.g., print transcription, model output, metadata)
  print(part)

Python

You can also build your own Live agent, leveraging a standard text-based LLM, using the bidirectional streaming capability of the GenAI Processor library and the Google Speech API (full code on GitHub):

from genai_processors.core import genai_model, realtime, speech_to_text, text_to_speech

# Input processor: gets input from audio in (mic) and transcribes into text
input_processor = audio_io.PyAudioIn(...) + speech_to_text.SpeechToText(... )
play_output = audio_io.PyAudioOut(...)

# Main model that will be used to generate the response.
genai_processor = genai_model.GenaiModel(...),

# TTS processor that will be used to convert the text response to audio. Note
# the rate limit audio processor that will be used to stream back small audio
# chunks to the client at the same rate as how they are played back.  
tts = text_to_speech.TextToSpeech(...) + rate_limit_audio.RateLimitAudio(...)


# Creates an agent as:
# mic -> speech to text -> text conversation -> text to speech -> play audio
live_agent = (
     input_processor
     + realtime.LiveModelProcessor(turn_processor=genai_processor + tts)
     + play_output
 )
async for part in live_agent(streams.endless_stream()):
     …

Python

We anticipate a growing need for proactive LLM applications where responsiveness is critical. Even for non-streaming use cases, processing data as soon as it is available can significantly reduce latency and time to first token (TTFT), which is essential for building a good user experience. While many LLM APIs prioritize synchronous, simplified interfaces, GenAI Processors – by leveraging native Python features – offer a way for writing responsive applications without making code more complex. Trip planner and Research Agent examples demonstrate how turn-based agents can use the concurrency feature of GenAI Processors to increase responsiveness.

Core design principles

At the heart of GenAI Processors is the concept of a Processor: a fundamental building block that encapsulates a specific unit of work. It takes a stream of inputs, performs an operation, and outputs a stream of results. This simple, consistent API is a cornerstone of the library’s power and flexibility.

Here’s a look at the core design decisions and their benefits for developers:

  • Modular design: Break down complex workflows into self-contained Processor units. This ensures code reusability, testability, and significantly simplifies maintaining intricate pipelines.
  • Asynchronous & concurrent: Fully leverages Python’s asyncio for efficient handling of I/O-bound and compute-bound tasks. This enables responsive applications without manual threading or complex concurrency management.
  • Integrated with Gemini API: Dedicated processors like GenaiModel (for turn-based interaction) and LiveProcessor (for real-time streaming) simplify interaction with the Gemini API, including the complexities of the Live API. This reduces boilerplate and accelerates integration.
  • Extensible: Easily create custom processors by inheriting from base classes or using decorators. Integrate your own data processing logic, external APIs, or specialized operations seamlessly into your pipelines.
  • Unified multimodal handling: The ProcessorPart wrapper provides a consistent interface for handling diverse data types (text, images, audio, JSON, etc.) within the pipeline.
  • Stream manipulation utilities: Built-in utilities for splitting, concatenating, and merging asynchronous streams. This provides fine-grained control over data flow within complex pipelines.

Getting started

Getting started with GenAI Processors is straightforward. You can install it with pip:

pip install genai-processors

Python

To help you get familiar with the library, we provide a series of Colab notebooks that walk you through the core concepts and demonstrate how to build various types of processors and applications. We recommend starting with the Content API Colab and Processor Intro Colab.

You can also explore the examples/ directory in the repository for practical demonstrations of how to build more complex applications, such as a research agent and a live commentary agent.

READ ALSO

Flow adds speech to videos and expands to more countries

Gemini AI can now turn photos into videos

Looking ahead

GenAI Processors is currently in its early stages, and we believe it provides a solid foundation for tackling complex workflow and orchestration challenges in AI applications. While the Google GenAI SDK is available in multiple languages, GenAI Processors currently only support Python.

The core/ directory contains fundamental processors, and we actively encourage community contributions for more specialized functionalities in the contrib/ directory. We’re excited to collaborate with the developer community to expand the library and build even more sophisticated AI systems.

Ready to build more robust and responsive Gemini applications?

Check out the GenAI Processors repository on GitHub: https://github.com/google-gemini/genai-processors

We look forward to seeing what you create!


Acknowledgments

GenAI Processors is the result of the dedication and hard work of a fantastic team. We’d like to acknowledge the following individuals who played a key role in bringing this library to life: Juliette Love, KP Sawhney, Antoine He, Will Thompson, Arno Eigenwillig, Ke Wang, Parth Kothari, Tim Blyth, Philipp Schmid, Patrick Löber, Omar Sanseviero, Alexey Kolganov, Adam Langley, Evan Senter, Seth Odoom, Thierry Coppey, and Murat Ozturk.



Source_link

Related Posts

Flow adds speech to videos and expands to more countries
Google Marketing

Flow adds speech to videos and expands to more countries

July 11, 2025
Gemini AI can now turn photos into videos
Google Marketing

Gemini AI can now turn photos into videos

July 11, 2025
Introducing Gemini with photo to video capability
Google Marketing

Introducing Gemini with photo to video capability

July 10, 2025
Android’s Circle to Search feature gets AI and gaming upgrades
Google Marketing

Android’s Circle to Search feature gets AI and gaming upgrades

July 10, 2025
Advancing agentic AI development with Firebase Studio
Google Marketing

Advancing agentic AI development with Firebase Studio

July 10, 2025
How to use Gemini on a Wear OS smartwatch
Google Marketing

How to use Gemini on a Wear OS smartwatch

July 10, 2025
Next Post
Top Profitable Tech Business Ideas in Saudi Arabia

Top Profitable Tech Business Ideas in Saudi Arabia

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025
Eating Bugs – MetaDevo

Eating Bugs – MetaDevo

May 29, 2025
Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

Top B2B & Marketing Podcasts to Lead You to Succeed in 2025 – TopRank® Marketing

May 30, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025

EDITOR'S PICK

Personalization at Scale: An Interview with Mark Abraham, Global Leader, Boston Consulting Group

Personalization at Scale: An Interview with Mark Abraham, Global Leader, Boston Consulting Group

June 3, 2025
Account-Based Confusion (ABC) – ABM Consortium

Account-Based Confusion (ABC) – ABM Consortium

May 31, 2025
Sales and Marketing Analytics for Business Owners

Sales and Marketing Analytics for Business Owners

June 25, 2025
What Are Meta Keywords? + Why to Avoid Them

What Are Meta Keywords? + Why to Avoid Them

June 17, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to turn insights into business value
  • The best Amazon Prime Day deals under $50 that you can get before the event is over
  • Mistral AI Releases Devstral 2507 for Code-Centric Language Modeling
  • Does Being Mentioned on Highly Linked Pages Influence AI Mentions?
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?