Studio quality and speed
Developers can now choose between two distinct model variants designed to meet specific production and latency requirements:
- Lyria 3 Pro (lyria-3-pro-preview): Our premier model for full-length song generation creates tracks up to approximately three minutes long. These tracks have professional-grade structural awareness, making it the standard for studio-quality, premium output.
- Lyria 3 Clip (lyria-3-clip-preview): Optimized for speed and high-volume requests, this variant generates high-quality 30-second clips. It is the ideal choice for rapid prototyping, background loops and social media assets.
Both models support realistic vocals that convey expressive nuance, plus improved clarity for more natural sounds. Developers can also explore global languages and genres. Generate vocals in different languages, and create music spanning genres from pop to funk to Motown.
Precision control and multimodal input
Lyria 3 introduces granular controls that allow you to direct the model with precision through natural language prompts:
- Tempo conditioning: Set a specific tempo (e.g., Fast, slow) with high accuracy to ensure the music fits your application’s rhythm.
- Time-aligned lyrics: You can outline the progression of a song in your prompt and control when lyrics start and end within a track.
- Multimodal image-to-music input: Beyond text, Lyria 3 supports multimodal inputs. You can provide an image to influence the mood, style and atmosphere of the audio.
Lyria 3 in action
To show how you could incorporate this model into an application we built some examples in Google AI Studio:
- Background music for videos: This demo app allows users to upload a video that is analyzed by Gemini 3 flash to generate a descriptive prompt for a custom soundtrack. Lyria then uses this prompt to compose a matching instrumental that serves as a synchronized background music for the video.













