• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, June 12, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

This AI Paper from Stanford and Harvard Explains Why Most ‘Agentic AI’ Systems Feel Impressive in Demos and then Completely Fall Apart in Real Use

Josh by Josh
December 26, 2025
in Al, Analytics and Automation
0
This AI Paper from Stanford and Harvard Explains Why Most ‘Agentic AI’ Systems Feel Impressive in Demos and then Completely Fall Apart in Real Use


Agentic AI systems sit on top of large language models and connect to tools, memory, and external environments. They already support scientific discovery, software development, and clinical research, yet they still struggle with unreliable tool use, weak long horizon planning, and poor generalization. The latest research paper ‘Adaptation of Agentic AI‘ from Stanford, Harvard, UC Berkeley, Caltech proposes a unified view of how these systems should adapt and maps existing methods into a compact, mathematically defined framework.

How this research paper models an agentic AI system?

The research survey models an agentic AI system as a foundation model agent along with 3 key components. A planning module decomposes goals into sequences of actions, using static procedures such as Chain-of-Thought and Tree-of-Thought, or dynamic procedures such as ReAct and Reflexion that react to feedback. A tool use module connects the agent to web search engines, APIs, code execution environments, Model Context Protocols, and browser automation. A memory module stores short term context and long term knowledge, accessed through retrieval augmented generation. Adaptation changes prompts or parameters for these components using supervised fine tuning, preference based methods such as Direct Preference Optimization, reinforcement learning methods such as Proximal Policy Optimization and Group Relative Policy Optimization, and parameter efficient techniques such as low rank adaptation.

READ ALSO

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

https://arxiv.org/pdf/2512.16301

Four adaptation paradigms

The framework defines 4 adaptation paradigms by combining 2 binary choices. The first dimension is the target, agent adaptation versus tool adaptation. The second dimension is the supervision signal, tool execution versus agent output. This yields A1 and A2 for adapting the agent, and T1 and T2 for adapting tools.

A1, Tool Execution Signaled Agent Adaptation, optimizes the agent using feedback derived from tool execution. A2, Agent Output Signaled Agent Adaptation, optimizes the agent using a signal defined only on its final outputs. T1, Agent-Agnostic Tool Adaptation, optimizes tools without referring to a particular agent. T2, Agent-Supervised Tool Adaptation, optimizes tools under supervision from a fixed agent.

https://arxiv.org/pdf/2512.16301

A1, learning from verifiable tool feedback

In A1, the agent receives an input x, produces a structured tool call a, the tools return a result y, and the learning objective O_tool measures tool success, for example execution correctness or retrieval quality. The paper covers both supervised imitation of successful tool trajectories and reinforcement learning that uses verifiable tool outcomes as reward.

Toolformer, ToolAlpaca, and Gorilla illustrate supervised A1 methods, since each uses execution results of real tools to construct or filter training traces before imitation. All of them keep the supervision signal defined at the tool behavior level, not at the final answer level.

DeepRetrieval is a central A1 reinforcement learning example. It frames query reformulation as a Markov decision process where the state is the user query, the action is a rewritten query, and the reward combines retrieval metrics such as Recall and nDCG, a format term, and, for text to SQL, SQL execution accuracy. The policy is trained with KL regularized Proximal Policy Optimization and the same objective covers literature search, corpus question answering, and text to SQL.

A2, learning from final agent outputs

A2 covers cases where the optimization objective O_agent depends only on the final output o produced by the agent, even when the agent uses tools internally. The survey shows that supervising only o is not enough to teach tools, because the agent can ignore tools and still improve likelihood. Effective A2 systems therefore combine supervision on tool calls with supervision on final answers, or assign sparse rewards such as exact match accuracy to o and propagate them back through the full trajectory.

T1, agent agnostic tool training

T1 freezes the main agent and optimizes tools so that they are broadly reusable. The objective O_tool depends only on tool outputs and is measured by metrics such as retrieval accuracy, ranking quality, simulation fidelity, or downstream task success. A1 trained search policies, such as DeepRetrieval, can later be reused as T1 tools inside new agentic systems without modifying the main agent.

T2, tools optimized under a frozen agent

T2 assumes a powerful but fixed agent A, which is common when the agent is a closed source foundation model. The tool executes calls and returns results that the agent then uses to produce o. The optimization objective again lives on O_agent, but the trainable parameters belong to the tool. The paper describes quality weighted training, target based training, and reinforcement learning variants that all derive learning signals for the tool from the final agent outputs.

The survey treats long term memory as a special case of T2. Memory is an external store written and read through learned functions, and the agent remains frozen. Recent T2 systems include s3, which trains a 7 billion parameter searcher that maximizes a Gain Beyond RAG reward defined by a frozen generator, and AgentFlow, which trains a planner to orchestrate mostly frozen Qwen2.5 based modules using Flow GRPO.

https://arxiv.org/pdf/2512.16301

Key Takeaways

  • The research defines a precise 4 paradigm framework for adapting agentic AI by crossing 2 dimensions, whether adaptation targets the agent or tools, and whether the supervision signal comes from tool execution or from final agent outputs.
  • A1 methods such as Toolformer, ToolAlpaca, Gorilla, and DeepRetrieval adapt the agent directly from verifiable tool feedback, including retrieval metrics, SQL execution accuracy, and code execution results, often optimized with KL regularized Proximal Policy Optimization.
  • A2 methods optimize the agent from signals on final outputs, for example answer accuracy, and the paper shows that systems must still supervise tool calls or propagate sparse rewards through full trajectories, otherwise the agent can ignore tools while still improving likelihood.
  • T1 and T2 shift learning to tools and memory, T1 trains generally useful retrievers, searchers, and simulators without a specific agent in mind, while T2 adapts tools under a frozen agent, as in s3 and AgentFlow where a fixed generator supervises a learned searcher and planner.
  • The research team introduce an adaptation landscape that relates monolithic versus modular and local versus systemic control, and they argue that practical systems will combine rare A1 or A2 updates on a strong base model with frequent T1 and T2 adaptation of retrievers, search policies, simulators, and memory for robustness and scalability.

Check out the Paper and GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.



Source_link

Related Posts

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News
Al, Analytics and Automation

MIT affiliates win 2026 Hertz Foundation Fellowships | MIT News

June 11, 2026
Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
Al, Analytics and Automation

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

June 11, 2026
Building Semantic Search with Transformers.js and Sentence Embeddings
Al, Analytics and Automation

Building Semantic Search with Transformers.js and Sentence Embeddings

June 11, 2026
Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News
Al, Analytics and Automation

Startup’s nuclear-inspired cooling system could make data centers more sustainable | MIT News

June 10, 2026
Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared
Al, Analytics and Automation

Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared

June 10, 2026
The Practitioner’s Guide to AgentOps
Al, Analytics and Automation

The Practitioner’s Guide to AgentOps

June 10, 2026
Next Post
The best PS5 accessories for 2026

The best PS5 accessories for 2026

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Shutdown Scare Made Me Test 5 Adobe Animate Alternatives

Shutdown Scare Made Me Test 5 Adobe Animate Alternatives

February 7, 2026
Teyana Taylor Set to Drop Her Own Air Jordan 3

Teyana Taylor Set to Drop Her Own Air Jordan 3

February 12, 2026
Choosing a Technical Content Generation Partner for Success

Choosing a Technical Content Generation Partner for Success

February 10, 2026
Trump and Musk’s split reflects a nation that’s divided on clean energy

Trump and Musk’s split reflects a nation that’s divided on clean energy

June 8, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Push Delivery Tests, ChatGPT Ads Updates, and More
  • Researchers Are Developing Textiles That Can Produce Drinking Water From The Air
  • Father’s Day marketing in 2026: five trends every advertiser needs to know
  • Employer Branding’s Next Chapter: Why Measurement Matters More Than Ever
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions