• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, May 27, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Google DeepMind Researchers Apply Semantic Evolution to Create Non Intuitive VAD-CFR and SHOR-PSRO Variants for Superior Algorithmic Convergence

Josh by Josh
February 24, 2026
in Al, Analytics and Automation
0


In the competitive arena of Multi-Agent Reinforcement Learning (MARL), progress has long been bottlenecked by human intuition. For years, researchers have manually refined algorithms like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO), navigating a vast combinatorial space of update rules via trial-and-error.

Google DeepMind research team has now shifted this paradigm with AlphaEvolve, an evolutionary coding agent powered by Large Language Models (LLMs) that automatically discovers new multi-agent learning algorithms. By treating source code as a genome, AlphaEvolve doesn’t just tune parameters—it invents entirely new symbolic logic.

READ ALSO

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

Semantic Evolution: Beyond Hyperparameter Tuning

Unlike traditional AutoML, which often optimizes numeric constants, AlphaEvolve performs semantic evolution. It utilizes Gemini 2.5 pro as an intelligent genetic operator to rewrite logic, introduce novel control flows, and inject symbolic operations into the algorithm’s source code.

The framework follows a rigorous evolutionary loop:

  • Initialization: The population begins with standard baseline implementations, such as standard CFR.
  • LLM-Driven Mutation: A parent algorithm is selected based on fitness, and the LLM is prompted to modify the code to reduce exploitability.
  • Automated Evaluation: Candidates are executed on proxy games (e.g., Kuhn Poker) to compute negative exploitability scores.
  • Selection: Valid, high-performing candidates are added back to the population, allowing the search to discover non-intuitive optimizations.

VAD-CFR: Mastering Game Volatility

The first major discovery is Volatility-Adaptive Discounted (VAD-) CFR. In Extensive-Form Games (EFGs) with imperfect information, agents must minimize regret across a sequence of histories. While traditional variants use static discounting, VAD-CFR introduces three mechanisms that often elude human designers:

  1. Volatility-Adaptive Discounting: Using an Exponential Weighted Moving Average (EWMA) of the instantaneous regret magnitude, the algorithm tracks the “shake” of the learning process. When volatility is high, it increases discounting to forget unstable history faster; when it drops, it retains more history for fine-tuning.
  2. Asymmetric Instantaneous Boosting: VAD-CFR boosts positive instantaneous regrets by a factor of 1.1. This allows the agent to immediately exploit beneficial deviations without the lag associated with standard accumulation.
  3. Hard Warm-Start & Regret-Magnitude Weighting: The algorithm enforces a ‘hard warm-start,’ postponing policy averaging until iteration 500. Interestingly, the LLM generated this threshold without knowing the 1000-iteration evaluation horizon. Once accumulation begins, policies are weighted by the magnitude of instantaneous regret to filter out noise.

In empirical tests, VAD-CFR matched or surpassed state-of-the-art performance in 10 out of 11 games, including Leduc Poker and Liar’s Dice, with 4-player Kuhn Poker being the only exception.

SHOR-PSRO: The Hybrid Meta-Solver

The second breakthrough is Smoothed Hybrid Optimistic Regret (SHOR-) PSRO. PSRO operates on a higher abstraction called the Meta-Game, where a population of policies is iteratively expanded. SHOR-PSRO evolves the Meta-Strategy Solver (MSS), the component that determines how opponents are pitted against each other.

The core of SHOR-PSRO is a Hybrid Blending Mechanism that constructs a meta-strategy σ by linearly blending two distinct components:

σ hybrid = (1 -𝛌) . σ ORM + 𝛌 . σSoftmax

  • σ ORM : Provides the stability of Optimistic Regret Matching.
  • σSoftmax: A Boltzmann distribution over pure strategies that aggressively biases the solver toward high-reward modes.

SHOR-PSRO employs a dynamic Annealing Schedule. The blending factor 𝛌 anneals from 0.3 to 0.05, gradually shifting the focus from greedy exploration to robust equilibrium finding. Furthermore, it discovered a Training vs. Evaluation Asymmetry: the training solver uses the annealing schedule for stability, while the evaluation solver uses a fixed, low blending factor (𝛌=0.01) for reactive exploitability estimates.

Key Takeaways

  • AlphaEvolve Framework: DeepMind Researchers introduced AlphaEvolve, an evolutionary system that uses Large Language Models (LLMs) to perform ‘semantic evolution’ by treating an algorithm’s source code as its genome. This allows the system to discover entirely new symbolic logic and control flows rather than just tuning hyperparameters.
  • Discovery of VAD-CFR: The system evolved a new regret minimization algorithm called Volatility-Adaptive Discounted (VAD-) CFR. It outperforms state-of-the-art baselines like Discounted Predictive CFR+ by using non-intuitive mechanisms to manage regret accumulation and policy derivation.
  • VAD-CFR’s Adaptive Mechanisms: VAD-CFR utilizes a volatility-sensitive discounting schedule that tracks learning instability via an Exponential Weighted Moving Average (EWMA). It also features an ‘Asymmetric Instantaneous Boosting’ factor of 1.1 for positive regrets and a hard warm-start that delays policy averaging until iteration 500 to filter out early-stage noise.
  • Discovery of SHOR-PSRO: For population-based training, AlphaEvolve discovered Smoothed Hybrid Optimistic Regret (SHOR-) PSRO. This variant utilizes a hybrid meta-solver that blends Optimistic Regret Matching with a smoothed, temperature-controlled distribution over best pure strategies to improve convergence speed and stability.
  • Dynamic Annealing and Asymmetry: SHOR-PSRO automates the transition from exploration to exploitation by annealing its blending factor and diversity bonuses during training. The search also discovered a performance-boosting asymmetry where the training-time solver uses time-averaging for stability while the evaluation-time solver uses a reactive last-iterate strategy.

Check out the Paper. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

Related Posts

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs
Al, Analytics and Automation

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

May 26, 2026
Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving
Al, Analytics and Automation

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

May 26, 2026
Best Authentication Platforms for AI Agents and MCP Servers in 2026
Al, Analytics and Automation

Best Authentication Platforms for AI Agents and MCP Servers in 2026

May 25, 2026
Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments
Al, Analytics and Automation

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

May 25, 2026
Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
Al, Analytics and Automation

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

May 24, 2026
Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
Al, Analytics and Automation

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

May 24, 2026
Next Post
Google clamps down on Antigravity 'malicious usage', cutting off OpenClaw users in sweeping ToS enforcement move

Google clamps down on Antigravity 'malicious usage', cutting off OpenClaw users in sweeping ToS enforcement move

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Moburst’s Monthly Marketing Roundup #29

Moburst’s Monthly Marketing Roundup #29

March 31, 2026
IBM sees enterprise customers are using ‘everything’ when it comes to AI, the challenge is matching the LLM to the right use case

IBM sees enterprise customers are using ‘everything’ when it comes to AI, the challenge is matching the LLM to the right use case

June 26, 2025
Will SEO Be Replaced By AI? Nope. That’s NOT How Google Works

Will SEO Be Replaced By AI? Nope. That’s NOT How Google Works

December 27, 2025

The year of the roller coaster: Top communicators on leading through change in 2025 and beyond

October 30, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Social media APIs explained (in simple terms)
  • Sony Abruptly Shuts Down Online Multiplayer Game Destruction AllStars
  • Key Takeaways from Cairns Crocodiles 2026
  • Dark Store and Hyperlocal Delivery Platform Development
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions