• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, March 14, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Announcing User Simulation in ADK Evaluation

Josh by Josh
November 7, 2025
in Google Marketing
0
Announcing User Simulation in ADK Evaluation


Ai-2-banner (1)

Agents are inherently conversational. Users may need to ask follow-up questions, refine previous requests, and provide additional information as needed. However, manually scripting tests for your agent for such multi-turn conversations is a brittle and time-consuming process. You write dozens of user_input and expected_output pairs, only for them to break with the slightest change in your agent’s behavior, turning test maintenance into a frustrating chore.

Today, we’re excited to announce a new feature in the Agent Development Kit (ADK) that helps address this problem: User Simulation. This new feature allows you to move away from testing a rigid implementation path and instead evaluate your agent’s ability to actually achieve a user’s intent.

What is the User Simulator?

At its core, the User Simulator is an LLM-powered user prompt generator. This first release is integrated directly into the ADK evaluation framework, allowing you to run it locally. You provide it with a high-level goal, and it dynamically generates the user side of a conversation to pursue that goal. It’s not a separate service; it’s a tool within the ADK that you run locally, allowing for a fast, iterative “inner loop” workflow.

How It Works

  1. Defining a Conversation Scenario

Instead of a rigid turn-by-turn script, you provide a ConversationScenario. This is a simple JSON object with two key parts:

  1. starting_prompt: A fixed, initial prompt to begin the conversation.
  2. conversation_plan: A natural language guideline that tells the simulator its objective.

Here’s an example evaluation set for an agent with tools to roll dice and check for prime numbers:

{
  "scenarios": [
    {
      "starting_prompt": "What can you do for me?",
      "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
    },
    {
      "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!",
      "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a d6. Finally, ask the agent to do the same with 2 d20."
    }
  ]
}

JSON

When you run the evaluation, the simulator will handle the back-and-forth conversation dynamically until the conversation_plan is fulfilled. Here is an example of what that generated conversation for the first scenario shown above might look like (reformatted for clarity):

[USER]: What can you do for me?
[AGENT]: I can roll dice and check if numbers are prime. How can I help?
[USER]: Please roll a 20-sided die for me.
[AGENT]: Of course. The result is 17.
[USER]: Thanks. Can you check if 17 is a prime number?
[AGENT]: Yes, 17 is a prime number.
[USER]: </finished>
--------------------
EVALUATION RESULT: COMPLETED

Plain text

Notice how the conversation_plan defines a sequence of goals. It doesn’t specify the user’s exact prompts or the agent’s exact expected responses. It only cares about the outcome: getting a dice roll, then getting a prime number check on that result. This makes the test resilient to minor changes in your agent’s conversational style or internal logic.

2. Configuring the Simulation

You have direct control over the simulator’s behavior by providing an EvalConfig file. This allows you to fine-tune the simulation for your specific testing needs.

Here are the key parameters you can configure:

  • Model: Specify which model backs the user simulator (e.g., gemini-2.5-flash).
  • Model Configuration: Specify options for the model, such as thinking behavior.
  • Turn Budget: Set the maximum number of user-agent interactions (max_allowed_invocations) before the conversation is terminated, preventing infinite loops.

Custom Behavior: In addition to the above parameters, you can override the default system prompt to change the simulator’s persona. This allows you to test how your agent handles different types of users, such as a confused user or a more demanding one. We plan to add persona configuration support via the EvalConfig soon.

Here is an example of a configuration file with an evaluation criterion and a configuration for the user simulator:

{
  "criteria": {
   "hallucinations_v1": {
     "threshold": 0.5,
     "evaluate_intermediate_nl_responses": true
   }
 },
  "user_simulator_config": {
    "model": "gemini-2.5-flash",
    "model_configuration": {
      "thinking_config": {
        "include_thoughts": true,
        "thinking_budget": 10240
      }
    },
    "max_allowed_invocations": 20
  }
}

JSON

3. Running the Evaluation

With your scenarios and configuration in place, you can run the evaluation to get a detailed breakdown of the interaction.

*********************************************************************
Eval Run Summary
eval_set_with_scenarios:
  Tests passed: 2
  Tests failed: 0
********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: 83903cba
Overall Eval Status: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Status: PASSED, Score: 1.0, Threshold: 0.5
---------------------------------------------------------------------
Invocation Details:
+----+----------------------+------------------------------------------+-------------------+
|    | prompt               | actual_response                          | hallucinations_v1 |
+====+======================+==========================================+===================+
|  0 | What can you do for  | I can roll dice of different sizes and   | Status: PASSED,   |
|    | me?                  | check if numbers are prime. For example, | Score: 1.0        |
|    |                      | I can roll an 8-sided die and tell you   |                   |
|    |                      | the result. Also, I can check if a       |                   |
|    |                      | number (or a list of numbers) is prime.  |                   |
+----+----------------------+------------------------------------------+-------------------+
|  1 | Please roll a        | I rolled a 20-sided die, and the result  | Status: PASSED,   |
|    | 20-sided die for me. | is 16.                                   | Score: 1.0        |
+----+----------------------+------------------------------------------+-------------------+
|  2 | Can you check if 16  | No, 16 is not a prime number.            | Status: PASSED,   |
|    | is a prime number?   |                                          | Score: 1.0        |
+----+----------------------+------------------------------------------+-------------------+

********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: b7944d3f
Overall Eval Status: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Status: PASSED, Score: 0.875, Threshold: 0.5
---------------------------------------------------------------------
Invocation Details:
+----+---------------------------+------------------------------------------+-------------------+
|    | prompt                    | actual_response                          | hallucinations_v1 |
+====+===========================+==========================================+===================+
|  0 | Hi, I'm running a         | Understood! I can roll dice and check if | Status: PASSED,   |
|    | tabletop RPG in which     | the results are prime numbers for you.   | Score: 1.0        |
|    | prime numbers are bad!    | Let me know when you need me to roll a   |                   |
|    |                           | die and check for primes.                |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  1 | I don't really care about | Alright, I can roll dice for you and     | Status: PASSED,   |
|    | the actual number; I just | tell you whether the result is a prime   | Score: 1.0        |
|    | need you to tell me if    | number (bad) or not (good). Just tell me |                   |
|    | the roll is good or bad.  | how many sides the die should have.      |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  2 | Great, please roll a d6.  | The result is 6, which is not a prime    | Status: PASSED,   |
|    |                           | number. That's a good roll!              | Score: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+
|  3 | Okay, now please roll 2   | Okay, so 19 is prime (bad) and 6 is not  | Status: PASSED,   |
|    | d20.                      | prime (good). One good, one bad.         | Score: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+

Plain text

What This Means for Developers

This initial release of User Simulation is focused on solving the immediate toil of creating and maintaining multi-turn tests. It helps you:

  • Dramatically reduce test creation time: Stop writing complex, turn-by-turn scripts and instead define simple, high-level goals.
  • Build more resilient tests: By focusing on intent over a specific conversational path, your tests won’t break every time you refactor a prompt.
  • Create a reliable regression suite: Quickly generate a wide range of test cases to build a safety net that catches regressions before they reach production.

We believe that robust, goal-oriented simulation is a fundamental capability for building reliable and trustworthy AI agents. This feature is the foundational first step in our broader vision to deliver a comprehensive set of simulation capabilities for the entire agent lifecycle. On behalf of the core team who brought this feature to life — Ankur Sharma, Keyur Joshi, Pierre Thodoroff, Sebastian Caldas, and Xiaowei Li — we’re excited to see what you build and welcome your feedback as you start using this feature.

Ready to get started? Dive into the ADK documentation and Colab tutorial and start exploring the User Simulation feature today.



Source_link

READ ALSO

Gemini’s task automation is here and it’s wild

keynote speech from Christy Abizaid

Related Posts

Gemini’s task automation is here and it’s wild
Google Marketing

Gemini’s task automation is here and it’s wild

March 14, 2026
keynote speech from Christy Abizaid
Google Marketing

keynote speech from Christy Abizaid

March 14, 2026
Google Pixel 10A review: Just buy the 9A
Google Marketing

Google Pixel 10A review: Just buy the 9A

March 14, 2026
6 takeaways from our “Growing Up in the Digital Age” Summit
Google Marketing

6 takeaways from our “Growing Up in the Digital Age” Summit

March 14, 2026
What it was like to watch grieving parents stare down Mark Zuckerberg in court
Google Marketing

What it was like to watch grieving parents stare down Mark Zuckerberg in court

March 13, 2026
How Google Earth AI’s planetary intelligence is supporting global public health
Google Marketing

How Google Earth AI’s planetary intelligence is supporting global public health

March 13, 2026
Next Post
Hyper Personalization in Business: A Guide for Companies

Hyper Personalization in Business: A Guide for Companies

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

A Business Owner’s Guide to Selecting the Right AI Consultant

A Business Owner’s Guide to Selecting the Right AI Consultant

July 19, 2025
Rivian will pay $250M to settle lawsuit over R1 price hike

Rivian will pay $250M to settle lawsuit over R1 price hike

October 24, 2025
Gemini CLI extensions let you customize your command line

Gemini CLI extensions let you customize your command line

October 8, 2025
Trump administration’s ban on foreign-made drones starts this week — you can say goodbye to new DJI models

Trump administration’s ban on foreign-made drones starts this week — you can say goodbye to new DJI models

December 24, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Gemini’s task automation is here and it’s wild
  • Y Combinator-backed Random Labs launches Slate V1, claiming the first 'swarm-native' coding agent
  • Silverpush Releases Trend Intelligence Platform for Contextual Advertising
  • What Is Content Decay? (And How to Fix It Before It Tanks Your Traffic)
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions