• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, November 13, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Google Marketing

Announcing User Simulation in ADK Evaluation

Josh by Josh
November 7, 2025
in Google Marketing
0
Announcing User Simulation in ADK Evaluation
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Ai-2-banner (1)

Agents are inherently conversational. Users may need to ask follow-up questions, refine previous requests, and provide additional information as needed. However, manually scripting tests for your agent for such multi-turn conversations is a brittle and time-consuming process. You write dozens of user_input and expected_output pairs, only for them to break with the slightest change in your agent’s behavior, turning test maintenance into a frustrating chore.

Today, we’re excited to announce a new feature in the Agent Development Kit (ADK) that helps address this problem: User Simulation. This new feature allows you to move away from testing a rigid implementation path and instead evaluate your agent’s ability to actually achieve a user’s intent.

What is the User Simulator?

At its core, the User Simulator is an LLM-powered user prompt generator. This first release is integrated directly into the ADK evaluation framework, allowing you to run it locally. You provide it with a high-level goal, and it dynamically generates the user side of a conversation to pursue that goal. It’s not a separate service; it’s a tool within the ADK that you run locally, allowing for a fast, iterative “inner loop” workflow.

How It Works

  1. Defining a Conversation Scenario

Instead of a rigid turn-by-turn script, you provide a ConversationScenario. This is a simple JSON object with two key parts:

  1. starting_prompt: A fixed, initial prompt to begin the conversation.
  2. conversation_plan: A natural language guideline that tells the simulator its objective.

Here’s an example evaluation set for an agent with tools to roll dice and check for prime numbers:

{
  "scenarios": [
    {
      "starting_prompt": "What can you do for me?",
      "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
    },
    {
      "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!",
      "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a d6. Finally, ask the agent to do the same with 2 d20."
    }
  ]
}

JSON

When you run the evaluation, the simulator will handle the back-and-forth conversation dynamically until the conversation_plan is fulfilled. Here is an example of what that generated conversation for the first scenario shown above might look like (reformatted for clarity):

[USER]: What can you do for me?
[AGENT]: I can roll dice and check if numbers are prime. How can I help?
[USER]: Please roll a 20-sided die for me.
[AGENT]: Of course. The result is 17.
[USER]: Thanks. Can you check if 17 is a prime number?
[AGENT]: Yes, 17 is a prime number.
[USER]: </finished>
--------------------
EVALUATION RESULT: COMPLETED

Plain text

Notice how the conversation_plan defines a sequence of goals. It doesn’t specify the user’s exact prompts or the agent’s exact expected responses. It only cares about the outcome: getting a dice roll, then getting a prime number check on that result. This makes the test resilient to minor changes in your agent’s conversational style or internal logic.

2. Configuring the Simulation

You have direct control over the simulator’s behavior by providing an EvalConfig file. This allows you to fine-tune the simulation for your specific testing needs.

Here are the key parameters you can configure:

  • Model: Specify which model backs the user simulator (e.g., gemini-2.5-flash).
  • Model Configuration: Specify options for the model, such as thinking behavior.
  • Turn Budget: Set the maximum number of user-agent interactions (max_allowed_invocations) before the conversation is terminated, preventing infinite loops.

Custom Behavior: In addition to the above parameters, you can override the default system prompt to change the simulator’s persona. This allows you to test how your agent handles different types of users, such as a confused user or a more demanding one. We plan to add persona configuration support via the EvalConfig soon.

Here is an example of a configuration file with an evaluation criterion and a configuration for the user simulator:

{
  "criteria": {
   "hallucinations_v1": {
     "threshold": 0.5,
     "evaluate_intermediate_nl_responses": true
   }
 },
  "user_simulator_config": {
    "model": "gemini-2.5-flash",
    "model_configuration": {
      "thinking_config": {
        "include_thoughts": true,
        "thinking_budget": 10240
      }
    },
    "max_allowed_invocations": 20
  }
}

JSON

3. Running the Evaluation

With your scenarios and configuration in place, you can run the evaluation to get a detailed breakdown of the interaction.

*********************************************************************
Eval Run Summary
eval_set_with_scenarios:
  Tests passed: 2
  Tests failed: 0
********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: 83903cba
Overall Eval Status: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Status: PASSED, Score: 1.0, Threshold: 0.5
---------------------------------------------------------------------
Invocation Details:
+----+----------------------+------------------------------------------+-------------------+
|    | prompt               | actual_response                          | hallucinations_v1 |
+====+======================+==========================================+===================+
|  0 | What can you do for  | I can roll dice of different sizes and   | Status: PASSED,   |
|    | me?                  | check if numbers are prime. For example, | Score: 1.0        |
|    |                      | I can roll an 8-sided die and tell you   |                   |
|    |                      | the result. Also, I can check if a       |                   |
|    |                      | number (or a list of numbers) is prime.  |                   |
+----+----------------------+------------------------------------------+-------------------+
|  1 | Please roll a        | I rolled a 20-sided die, and the result  | Status: PASSED,   |
|    | 20-sided die for me. | is 16.                                   | Score: 1.0        |
+----+----------------------+------------------------------------------+-------------------+
|  2 | Can you check if 16  | No, 16 is not a prime number.            | Status: PASSED,   |
|    | is a prime number?   |                                          | Score: 1.0        |
+----+----------------------+------------------------------------------+-------------------+

********************************************************************
Eval Set Id: eval_set_with_scenarios
Eval Id: b7944d3f
Overall Eval Status: PASSED
---------------------------------------------------------------------
Metric: hallucinations_v1, Status: PASSED, Score: 0.875, Threshold: 0.5
---------------------------------------------------------------------
Invocation Details:
+----+---------------------------+------------------------------------------+-------------------+
|    | prompt                    | actual_response                          | hallucinations_v1 |
+====+===========================+==========================================+===================+
|  0 | Hi, I'm running a         | Understood! I can roll dice and check if | Status: PASSED,   |
|    | tabletop RPG in which     | the results are prime numbers for you.   | Score: 1.0        |
|    | prime numbers are bad!    | Let me know when you need me to roll a   |                   |
|    |                           | die and check for primes.                |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  1 | I don't really care about | Alright, I can roll dice for you and     | Status: PASSED,   |
|    | the actual number; I just | tell you whether the result is a prime   | Score: 1.0        |
|    | need you to tell me if    | number (bad) or not (good). Just tell me |                   |
|    | the roll is good or bad.  | how many sides the die should have.      |                   |
+----+---------------------------+------------------------------------------+-------------------+
|  2 | Great, please roll a d6.  | The result is 6, which is not a prime    | Status: PASSED,   |
|    |                           | number. That's a good roll!              | Score: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+
|  3 | Okay, now please roll 2   | Okay, so 19 is prime (bad) and 6 is not  | Status: PASSED,   |
|    | d20.                      | prime (good). One good, one bad.         | Score: 1.0        |
+----+---------------------------+------------------------------------------+-------------------+

Plain text

What This Means for Developers

This initial release of User Simulation is focused on solving the immediate toil of creating and maintaining multi-turn tests. It helps you:

  • Dramatically reduce test creation time: Stop writing complex, turn-by-turn scripts and instead define simple, high-level goals.
  • Build more resilient tests: By focusing on intent over a specific conversational path, your tests won’t break every time you refactor a prompt.
  • Create a reliable regression suite: Quickly generate a wide range of test cases to build a safety net that catches regressions before they reach production.

We believe that robust, goal-oriented simulation is a fundamental capability for building reliable and trustworthy AI agents. This feature is the foundational first step in our broader vision to deliver a comprehensive set of simulation capabilities for the entire agent lifecycle. On behalf of the core team who brought this feature to life — Ankur Sharma, Keyur Joshi, Pierre Thodoroff, Sebastian Caldas, and Xiaowei Li — we’re excited to see what you build and welcome your feedback as you start using this feature.

Ready to get started? Dive into the ADK documentation and Colab tutorial and start exploring the User Simulation feature today.



Source_link

READ ALSO

Google relaunches Cameyo to entice businesses from Windows to ChromeOS

Gemini Live audio updates help conversations feel more natural

Related Posts

Google relaunches Cameyo to entice businesses from Windows to ChromeOS
Google Marketing

Google relaunches Cameyo to entice businesses from Windows to ChromeOS

November 13, 2025
Gemini Live audio updates help conversations feel more natural
Google Marketing

Gemini Live audio updates help conversations feel more natural

November 13, 2025
Valve is welcoming Android games into Steam
Google Marketing

Valve is welcoming Android games into Steam

November 13, 2025
Try out Ads Advisor and Analytics Advisor
Google Marketing

Try out Ads Advisor and Analytics Advisor

November 12, 2025
Google is fighting those annoying USPS scam texts in court
Google Marketing

Google is fighting those annoying USPS scam texts in court

November 12, 2025
Google is introducing its own version of Apple’s private AI cloud compute
Google Marketing

Google is introducing its own version of Apple’s private AI cloud compute

November 12, 2025
Next Post
Hyper Personalization in Business: A Guide for Companies

Hyper Personalization in Business: A Guide for Companies

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

Google updating its ‘G’ icon for the first time in 10 years

Google updating its ‘G’ icon for the first time in 10 years

May 31, 2025
DemandJen’s Outreach Tips [+ Video]

DemandJen’s Outreach Tips [+ Video]

June 13, 2025

Indonesian Government Targets 16.6% Tax Revenue Growth In 2019

April 22, 2025
5 Key Ways LLMs Can Supercharge Your Machine Learning Workflow

5 Key Ways LLMs Can Supercharge Your Machine Learning Workflow

September 13, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Gamification In Financial Literacy: Trends And Examples
  • After‑School Care That Boosts Academic Success
  • Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget
  • Talk to Your TV — Bitmovin’s Agentic AI Hub Quietly Redefines How We Watch
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?