• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, May 1, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

How This Agentic Memory Research Unifies Long Term and Short Term Memory for LLM Agents

Josh by Josh
January 13, 2026
in Al, Analytics and Automation
0


How do you design an LLM agent that decides for itself what to store in long term memory, what to keep in short term context and what to discard, without hand tuned heuristics or extra controllers? Can a single policy learn to manage both memory types through the same action space as text generation?

Researchers from Alibaba Group and Wuhan University introduce Agentic Memory, or AgeMem, a framework that lets large language model agents learn how to manage both long term and short term memory as part of a single policy. Instead of relying on hand written rules or external controllers, the agent decides when to store, retrieve, summarize and forget, using memory tools that are integrated into the action space of the model.

READ ALSO

Improving understanding with language | MIT News

A Coding Implementation on Pyright Type Checking Covering Generics, Protocols, Strict Mode, Type Narrowing, and Modern Python Typing

Why current LLM agents struggle with memory

Most agent frameworks treat memory as two loosely coupled systems.

Long term memory stores user profiles, task information and previous interactions across sessions. Short term memory is the current context window, which holds the active dialogue and retrieved documents.

Existing systems design these two parts in isolation. Long term memory is handled through external stores such as vector databases with simple add and retrieve triggers. Short term memory is managed with retrieval augmented generation, sliding windows or summarization schedules.

This separation creates several issues.

  • Long term and short term memory are optimized independently. Their interaction is not trained end to end.
  • Heuristics decide when to write to memory and when to summarize. These rules are brittle and miss rare but important events.
  • Additional controllers or expert models increase cost and system complexity.

AgeMem removes the external controller and folds memory operations into the agent policy itself.

Memory as tools in the agent action space

In AgeMem, memory operations are exposed as tools. At each step, the model can emit either normal text tokens or a tool call. The framework defines 6 tools.

For long term memory:

  • ADD stores a new memory item with content and metadata.
  • UPDATE modifies an existing memory entry.
  • DELETE removes obsolete or low value items.

For short term memory:

  • RETRIEVE performs semantic search over long term memory and injects the retrieved items into the current context.
  • SUMMARY compresses spans of the dialogue into shorter summaries.
  • FILTER removes context segments that are not useful for future reasoning.

The interaction protocol has a structured format. Each step starts with a <think> block where the model reasons privately. Then the model either emits a <tool_call> block with a JSON list of tool invocations, or an <answer> block with the user facing response. Memory actions are therefore first class decisions, not side effects.

Three stage reinforcement learning for unified memory

AgeMem is trained with reinforcement learning in a way that couples long term and short term memory behavior.

The state at time t includes the current conversational context, the long term memory store and the task specification. The policy chooses either a token or a tool call as the action. The training trajectory for each sample is divided into 3 stages:

  1. Stage 1, long term memory construction: The agent interacts in a casual setting and observes information that will later become relevant. It uses ADD, UPDATE and DELETE to build and maintain long term memory. The short term context grows naturally during this stage.
  2. Stage 2, short term memory control under distractors: The short term context is reset. Long term memory persists. The agent now receives distractor content that is related but not necessary. It must manage short term memory using SUMMARY and FILTER to keep useful content and remove noise.
  3. Stage 3, integrated reasoning: The final query arrives. The agent retrieves from long term memory using RETRIEVE, controls the short term context, and produces the answer.

The crucial detail is that long term memory persists across all stages while short term memory is cleared between Stage 1 and Stage 2. This design forces the model to rely on retrieval rather than on residual context and exposes realistic long horizon dependencies.

Reward design and step wise GRPO

AgeMem uses a step wise variant of Group Relative Policy Optimization (GRPO). For each task, the system samples multiple trajectories that form a group. A terminal reward is computed for each trajectory, then normalized within the group to obtain an advantage signal. This advantage is broadcast to all steps in the trajectory so that intermediate tool choices are trained using the final outcome.

The total reward has three main components:

  • A task reward that scores answer quality between 0 and 1 using an LLM judge.
  • A context reward that measures the quality of short term memory operations, including compression, early summarization and preservation of query relevant content.
  • A memory reward that measures long term memory quality, including the fraction of high quality stored items, the usefulness of maintenance operations and the relevance of retrieved items to the query.

Uniform weights are used for these three components so that each contributes equally to the learning signal. A penalty term is added when the agent exceeds the maximum allowed dialogue length or when the context overflows the limit.

https://arxiv.org/pdf/2601.01885

Experimental setup and main results

The research team fine-tune AgeMem on the HotpotQA training split and evaluate on 5 benchmarks:

  • ALFWorld for text based embodied tasks.
  • SciWorld for science themed environments.
  • BabyAI for instruction following.
  • PDDL tasks for planning.
  • HotpotQA for multi hop question answering.

Metrics include success rate for ALFWorld, SciWorld and BabyAI, progress rate for PDDL tasks, and an LLM judge score for HotpotQA. They also define a Memory Quality metric using an LLM evaluator that compares stored memories to the supporting facts of HotpotQA.

https://arxiv.org/pdf/2601.01885

Baselines include LangMem, A Mem, Mem0, Mem0g and a no memory agent. Backbones are Qwen2.5-7B-Instruct and Qwen3-4B-Instruct.

On Qwen2.5-7B-Instruct, AgeMem reaches an average score of 41.96 across the 5 benchmarks, while the best baseline, Mem0, reaches 37.14. On Qwen3-4B-Instruct, AgeMem reaches 54.31, compared to 45.74 for the best baseline, A Mem.

Memory quality also improves. On HotpotQA, AgeMem reaches 0.533 with Qwen2.5-7B and 0.605 with Qwen3-4B, which is higher than all baselines.

Short term memory tools reduce prompt length while preserving performance. On HotpotQA, configurations with STM tools use about 3 to 5 percent fewer tokens per prompt than variants that replace STM tools with a retrieval pipeline.

Ablation studies confirm that each component matters. Adding only long term memory tools on top of a no memory baseline already yields clear gains. Adding reinforcement learning on these tools improves scores further. The full system with both long term and short term tools plus RL gives up to 21.7 percentage points improvement over the no memory baseline on SciWorld.

Implications for LLM agent design

AgeMem suggests a design pattern for future agentic systems. Memory should be handled as part of the learned policy, not as two external subsystems. By turning storage, retrieval, summarization and filtering into explicit tools and training them jointly with language generation, the agent learns when to remember, when to forget and how to manage context efficiently across long horizons.

Key Takeaways

  • AgeMem turns memory operations into explicit tools, so the same policy that generates text also decides when to ADD, UPDATE, DELETE, RETRIEVE, SUMMARY and FILTER memory.
  • Long term and short term memory are trained jointly through a three stage RL setup where long term memory persists across stages and short term context is reset to enforce retrieval based reasoning.
  • The reward function combines task accuracy, context management quality and long term memory quality with uniform weights, plus penalties for context overflow and excessive dialogue length.
  • Across ALFWorld, SciWorld, BabyAI, PDDL tasks and HotpotQA, AgeMem on Qwen2.5-7B and Qwen3-4B consistently outperforms memory baselines such as LangMem, A Mem and Mem0 on average scores and memory quality metrics.
  • Short term memory tools reduce prompt length by about 3 to 5 percent compared to RAG style baselines while keeping or improving performance, showing that learned summarization and filtering can replace handcrafted context handling rules.

Check out the FULL PAPER here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Check out our latest release of ai2025.dev, a 2025-focused analytics platform that turns model launches, benchmarks, and ecosystem activity into a structured dataset you can filter, compare, and export.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Improving understanding with language | MIT News
Al, Analytics and Automation

Improving understanding with language | MIT News

May 1, 2026
A Coding Implementation on Pyright Type Checking Covering Generics, Protocols, Strict Mode, Type Narrowing, and Modern Python Typing
Al, Analytics and Automation

A Coding Implementation on Pyright Type Checking Covering Generics, Protocols, Strict Mode, Type Narrowing, and Modern Python Typing

May 1, 2026
DeepSeek’s new AI model is rolling out quietly, not to the Wall Street market shock
Al, Analytics and Automation

DeepSeek’s new AI model is rolling out quietly, not to the Wall Street market shock

April 30, 2026
Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models | MIT News
Al, Analytics and Automation

Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models | MIT News

April 30, 2026
IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference
Al, Analytics and Automation

IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference

April 30, 2026
How AI Policy in South Africa Is Ruining Itself
Al, Analytics and Automation

How AI Policy in South Africa Is Ruining Itself

April 30, 2026
Next Post
Nvidia Rubin's rack-scale encryption signals a turning point for enterprise AI security

Nvidia Rubin's rack-scale encryption signals a turning point for enterprise AI security

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

10 NumPy One-Liners to Simplify Feature Engineering

10 NumPy One-Liners to Simplify Feature Engineering

July 16, 2025
How AI Is Already Shaping Your Brand Narrative

How AI Is Already Shaping Your Brand Narrative

June 29, 2025
Creative Marketing with Heike Young – TopRank® Marketing

Creative Marketing with Heike Young – TopRank® Marketing

October 29, 2025

Pinterest Marketing: The Complete 2025 Guide

August 15, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet
  • Improving understanding with language | MIT News
  • strategic rebrand resets Anmac’s place in Australia’s health system – Truly Deeply – Brand Strategy & Creative Agency Melbourne
  • Reitmans Unveils a New Logo and Enters a New Era With a Reimagined Store Concept
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions