• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, March 14, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Microsoft Releases Agent Lightning: A New AI Framework that Enables Reinforcement Learning (RL)-based Training of LLMs for Any AI Agent

Josh by Josh
October 30, 2025
in Al, Analytics and Automation
0


How do you convert real agent traces into reinforcement learning RL transitions to improve policy LLMs without changing your existing agent stack? Microsoft AI team releases Agent Lightning to help optimize multi-agent systems. Agent Lightning is a open-sourced framework that makes reinforcement learning work for any AI agent without rewrites. It separates training from execution, defines a unified trace format, and introduces LightningRL, a hierarchical method that converts complex agent runs into transitions that standard single turn RL trainers can optimize.

What Agent Lightning does?

The framework models an agent as a decision process. It formalizes the agent as a partially observable Markov decision process where the observation is the current input to the policy LLM, the action is the model call, and the reward can be terminal or intermediate. From each run it extracts only the calls made by the policy model, along with inputs, outputs, and rewards. This trims away other framework noise and yields clean transitions for training.

LightningRL performs credit assignment across multi step episodes, then optimizes the policy with a single turn RL objective. The research team describes compatibility with single turn RL methods. In practice, teams often use trainers that implement PPO or GRPO, such as VeRL, which fits this interface.

https://arxiv.org/pdf/2508.03680v1

System architecture

Agent Lightning uses Training Agent Disaggregation. A Lightning Server runs training and serving, and exposes an OpenAI like API for the updated model. A Lightning Client runs the agent runtime where it already lives, captures traces of prompts, tool calls, and rewards, and streams them back to the server. This keeps tools, browsers, shells, and other dependencies close to production while the GPU training stays in the server tier.

https://arxiv.org/pdf/2508.03680v1

The runtime supports two tracing paths. A default path uses OpenTelemetry spans, so you can pipe agent telemetry through standard collectors. There is also a lightweight embedded tracer for teams that do not want to deploy OpenTelemetry. Both paths end up in the same store for training.

https://arxiv.org/pdf/2508.03680v1

Unified data interface

Agent Lightning records each model call and each tool call as a span with inputs, outputs, and metadata. The algorithm layer adapts spans into ordered triplets of prompt, response, and reward. This selective extraction lets you optimize one agent in a multi agent workflow, or multiple agents at once, without touching orchestration code. The same traces can also drive automatic prompt optimization or supervised finetuning.

https://arxiv.org/pdf/2508.03680v1

Experiments and datasets

The research team reports three tasks. For text to SQL, the team uses the Spider benchmark. Spider contains more than 10,000 questions across 200 databases that span 138 domains. The policy model is Llama 3.2 3B Instruct. The implementation uses LangChain with a writer agent, a rewriter agent, and a checker. The writer and the rewriter are optimized, and the checker is left fixed. Rewards improve steadily during training and at test time.

https://arxiv.org/pdf/2508.03680v1

For retrieval augmented generation, the setup uses the MuSiQue benchmark and a Wikipedia scale index with about 21 million documents. The retriever uses BGE embeddings with cosine similarity. The agent is built with the OpenAI Agents SDK. The reward is a weighted sum of a format score and an F1 correctness score. Reward curves show stable gains during training and evaluation with the same base model.

https://arxiv.org/pdf/2508.03680v1

For math question answering with tool use, the agent is implemented with AutoGen and calls a calculator tool. The dataset is Calc X. The base model again is Llama 3.2 3B Instruct. Training improves the ability to invoke tools correctly and integrate results into final answers.

https://arxiv.org/pdf/2508.03680v1

Key Takeaways

  1. Agent Lightning uses Training Agent Disaggregation and a unified trace interface, so existing agents in LangChain, OpenAI Agents SDK, AutoGen, or CrewAI connect with near zero code change.
  2. LightningRL converts trajectories to transitions. It applies credit assignment to multi step runs, then optimizes the policy with single turn RL methods such as PPO or GRPO in standard trainers.
  3. Automatic Intermediate Rewarding, AIR, supplies dense feedback. AIR turns system signals such as tool return status into intermediate rewards to reduce sparse reward issues in long workflows.
  4. The research evaluates text to SQL on Spider, RAG on MuSiQue with a Wikipedia scale index using BGE embeddings and cosine similarity, and math tool use on Calc X, all with Llama 3.2 3B Instruct as the base model.
  5. The runtime records traces through OpenTelemetry, streams them to the training server, and exposes an OpenAI compatible endpoint for updated models, enabling scalable rollouts without moving tools.

Agent Lightning is a practical bridge between agent execution and reinforcement learning, not another framework rewrite. It formalizes agent runs as an Markov Decision Process (MDP), introduces LightningRL for credit assignment, and extracts transitions that slot into single turn RL trainers. The Training Agent Disaggregation design separates a client that runs the agent from a server that trains and serves an OpenAI compatible endpoint, so teams keep existing stacks. Automatic Intermediate Rewarding converts runtime signals into dense feedback, reducing sparse rewards in long workflows. Overall, Agent Lightning is a clean, minimal-integration path to make agents learn from their own traces.


Check out the Paper and Repo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

Tremble Chatbot App Access, Costs, and Feature Insights

Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

Related Posts

Tremble Chatbot App Access, Costs, and Feature Insights
Al, Analytics and Automation

Tremble Chatbot App Access, Costs, and Feature Insights

March 14, 2026
Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries
Al, Analytics and Automation

Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

March 14, 2026
How Joseph Paradiso’s sensing innovations bridge the arts, medicine, and ecology | MIT News
Al, Analytics and Automation

How Joseph Paradiso’s sensing innovations bridge the arts, medicine, and ecology | MIT News

March 13, 2026
Al, Analytics and Automation

Model Context Protocol (MCP) vs. AI Agent Skills: A Deep Dive into Structured Tools and Behavioral Guidance for LLMs

March 13, 2026
Top LiDAR Annotation Companies for AI & 3D Point Cloud Data
Al, Analytics and Automation

Top LiDAR Annotation Companies for AI & 3D Point Cloud Data

March 13, 2026
Can AI help predict which heart-failure patients will worsen within a year? | MIT News
Al, Analytics and Automation

Can AI help predict which heart-failure patients will worsen within a year? | MIT News

March 13, 2026
Next Post
Smarter DevOps with Kite: AI Meets Kubernetes

Smarter DevOps with Kite: AI Meets Kubernetes

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Build a Clean Beauty Brand with Proven Transparency

Build a Clean Beauty Brand with Proven Transparency

December 19, 2025
Probabilistic Attribution Explained: What It Is, How It Works, and Why It’s Still Useful August 2025 (Updated)

Probabilistic Attribution Explained: What It Is, How It Works, and Why It’s Still Useful August 2025 (Updated)

August 8, 2025
What to post on LinkedIn: 30 ideas plus examples

What to post on LinkedIn: 30 ideas plus examples

July 30, 2025
The Dun Dun Diner and ‘Showgirl’ Shenanigans

The Dun Dun Diner and ‘Showgirl’ Shenanigans

October 7, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Gemini’s task automation is here and it’s wild
  • Y Combinator-backed Random Labs launches Slate V1, claiming the first 'swarm-native' coding agent
  • Silverpush Releases Trend Intelligence Platform for Contextual Advertising
  • What Is Content Decay? (And How to Fix It Before It Tanks Your Traffic)
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions