Announcing GenAI Processors: Build powerful and flexible Gemini applications

Building sophisticated AI applications with Large Language Models (LLMs), especially those handling multimodal input and requiring real-time responsiveness, often feels like assembling a complex puzzle: you’re stitching together diverse data processing steps, asynchronous API calls, and custom logic. As complexity grows, this can lead to brittle, hard-to-maintain code.

Today, we’re introducing GenAI Processors, a new open-source Python library from Google DeepMind designed to bring structure and simplicity to these challenges. GenAI Processors provides an abstraction layer, defining a consistent Processor interface for everything from input handling and pre-processing to model calls and output processing.

At its core, GenAI Processors treat all input and output as asynchronous streams of ProcessorParts (i.e. two-way aka bidirectional streaming). Think of it as standardized data parts (e.g., a chunk of audio, a text transcription, an image frame) flowing through your pipeline along with associated metadata. This stream-based API allows for seamless chaining and composition of different operations, from low-level data manipulation to high-level model calls.

GenAI Processors library

The GenAI Processors library is designed to optimize the concurrent execution of a Processor. Any part in this example of execution flow can be generated concurrently when all its ancestors in the graph are computed, e.g. `c’12` can be generated concurrently to `a’1`. The flow maintains the ordering of the output stream with respect to the input stream and will be executed to minimize Time To First Token (prefer `a12` to `d12` whenever possible). This concurrency optimization is done under the hood: applying a Processor to a stream of input will automatically trigger this concurrent execution whenever possible.

For example, you can easily build a “Live Agent” capable of processing audio and video streams in real-time using the Gemini Live API with just a few lines of code. In the following example, notice how input sources and processing steps are combined using the + operator, creating a clear data flow (full code on GitHub):

from genai_processors.core import audio_io, live_model, video

# Input processor: combines camera streams and audio streams
input_processor = video.VideoIn() + audio_io.PyAudioIn(...)

# Output processor: plays the audio parts. Handles interruptions and pauses
# audio output when the user is speaking.
play_output = audio_io.PyAudioOut(...)

# Gemini Live API processor
live_processor = live_model.LiveProcessor(...)

# Compose the agent: mic+camera -> Gemini Live API -> play audio
live_processor = live_model.LiveProcessor(...)
live_agent = input_processor + live_processor + play_output

async for part in live_agent(streams.endless_stream()):
  # Process the output parts (e.g., print transcription, model output, metadata)
  print(part)

Python

You can also build your own Live agent, leveraging a standard text-based LLM, using the bidirectional streaming capability of the GenAI Processor library and the Google Speech API (full code on GitHub):

from genai_processors.core import genai_model, realtime, speech_to_text, text_to_speech

# Input processor: gets input from audio in (mic) and transcribes into text
input_processor = audio_io.PyAudioIn(...) + speech_to_text.SpeechToText(... )
play_output = audio_io.PyAudioOut(...)

# Main model that will be used to generate the response.
genai_processor = genai_model.GenaiModel(...),

# TTS processor that will be used to convert the text response to audio. Note
# the rate limit audio processor that will be used to stream back small audio
# chunks to the client at the same rate as how they are played back.  
tts = text_to_speech.TextToSpeech(...) + rate_limit_audio.RateLimitAudio(...)


# Creates an agent as:
# mic -> speech to text -> text conversation -> text to speech -> play audio
live_agent = (
     input_processor
     + realtime.LiveModelProcessor(turn_processor=genai_processor + tts)
     + play_output
 )
async for part in live_agent(streams.endless_stream()):
     …

Python

We anticipate a growing need for proactive LLM applications where responsiveness is critical. Even for non-streaming use cases, processing data as soon as it is available can significantly reduce latency and time to first token (TTFT), which is essential for building a good user experience. While many LLM APIs prioritize synchronous, simplified interfaces, GenAI Processors – by leveraging native Python features – offer a way for writing responsive applications without making code more complex. Trip planner and Research Agent examples demonstrate how turn-based agents can use the concurrency feature of GenAI Processors to increase responsiveness.

Core design principles

At the heart of GenAI Processors is the concept of a Processor: a fundamental building block that encapsulates a specific unit of work. It takes a stream of inputs, performs an operation, and outputs a stream of results. This simple, consistent API is a cornerstone of the library’s power and flexibility.

Here’s a look at the core design decisions and their benefits for developers:

Modular design: Break down complex workflows into self-contained Processor units. This ensures code reusability, testability, and significantly simplifies maintaining intricate pipelines.

Asynchronous & concurrent: Fully leverages Python’s asyncio for efficient handling of I/O-bound and compute-bound tasks. This enables responsive applications without manual threading or complex concurrency management.

Integrated with Gemini API: Dedicated processors like GenaiModel (for turn-based interaction) and LiveProcessor (for real-time streaming) simplify interaction with the Gemini API, including the complexities of the Live API. This reduces boilerplate and accelerates integration.

Extensible: Easily create custom processors by inheriting from base classes or using decorators. Integrate your own data processing logic, external APIs, or specialized operations seamlessly into your pipelines.

Unified multimodal handling: The ProcessorPart wrapper provides a consistent interface for handling diverse data types (text, images, audio, JSON, etc.) within the pipeline.

Stream manipulation utilities: Built-in utilities for splitting, concatenating, and merging asynchronous streams. This provides fine-grained control over data flow within complex pipelines.

Getting started

Getting started with GenAI Processors is straightforward. You can install it with pip:

pip install genai-processors

Python

To help you get familiar with the library, we provide a series of Colab notebooks that walk you through the core concepts and demonstrate how to build various types of processors and applications. We recommend starting with the Content API Colab and Processor Intro Colab.

You can also explore the examples/ directory in the repository for practical demonstrations of how to build more complex applications, such as a research agent and a live commentary agent.

Plan mode is now available in Gemini CLI

Google completes acquisition of Wiz

Looking ahead

GenAI Processors is currently in its early stages, and we believe it provides a solid foundation for tackling complex workflow and orchestration challenges in AI applications. While the Google GenAI SDK is available in multiple languages, GenAI Processors currently only support Python.

The core/ directory contains fundamental processors, and we actively encourage community contributions for more specialized functionalities in the contrib/ directory. We’re excited to collaborate with the developer community to expand the library and build even more sophisticated AI systems.

Ready to build more robust and responsive Gemini applications?

Check out the GenAI Processors repository on GitHub: https://github.com/google-gemini/genai-processors

We look forward to seeing what you create!

Acknowledgments

^{GenAI Processors is the result of the dedication and hard work of a fantastic team. We’d like to acknowledge the following individuals who played a key role in bringing this library to life: Juliette Love, KP Sawhney, Antoine He, Will Thompson, Arno Eigenwillig, Ke Wang, Parth Kothari, Tim Blyth, Philipp Schmid, Patrick Löber, Omar Sanseviero, Alexey Kolganov, Adam Langley, Evan Senter, Seth Odoom, Thierry Coppey, and Murat Ozturk.}

Source_link