• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, January 23, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time

Josh by Josh
December 10, 2025
in Al, Analytics and Automation
0
A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


In this tutorial, we explore how an intelligent agent can gradually form procedural memory by learning reusable skills directly from its interactions with an environment. We design a minimal yet powerful framework in which skills behave like neural modules: they store action sequences, carry contextual embeddings, and are retrieved by similarity when a new situation resembles an experience. As we run our agent through multiple episodes, we observe how its behaviour becomes more efficient, moving from primitive exploration to leveraging a library of skills that it has learned on its own. Check out the FULL CODES here.

import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict


class Skill:
   def __init__(self, name, preconditions, action_sequence, embedding, success_count=0):
       self.name = name
       self.preconditions = preconditions
       self.action_sequence = action_sequence
       self.embedding = embedding
       self.success_count = success_count
       self.times_used = 0
  
   def is_applicable(self, state):
       for key, value in self.preconditions.items():
           if state.get(key) != value:
               return False
       return True
  
   def __repr__(self):
       return f"Skill({self.name}, used={self.times_used}, success={self.success_count})"


class SkillLibrary:
   def __init__(self, embedding_dim=8):
       self.skills = []
       self.embedding_dim = embedding_dim
       self.skill_stats = defaultdict(lambda: {"attempts": 0, "successes": 0})
  
   def add_skill(self, skill):
       for existing_skill in self.skills:
           if self._similarity(skill.embedding, existing_skill.embedding) > 0.9:
               existing_skill.success_count += 1
               return existing_skill
       self.skills.append(skill)
       return skill
  
   def retrieve_skills(self, state, query_embedding=None, top_k=3):
       applicable = [s for s in self.skills if s.is_applicable(state)]
       if query_embedding is not None and applicable:
           similarities = [self._similarity(query_embedding, s.embedding) for s in applicable]
           sorted_skills = [s for _, s in sorted(zip(similarities, applicable), reverse=True)]
           return sorted_skills[:top_k]
       return sorted(applicable, key=lambda s: s.success_count / max(s.times_used, 1), reverse=True)[:top_k]
  
   def _similarity(self, emb1, emb2):
       return np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-8)
  
   def get_stats(self):
       return {
           "total_skills": len(self.skills),
           "total_uses": sum(s.times_used for s in self.skills),
           "avg_success_rate": np.mean([s.success_count / max(s.times_used, 1) for s in self.skills]) if self.skills else 0
       }

We define how skills are represented and stored in a memory structure. We implement similarity-based retrieval so that the agent can match a new state with past skills using cosine similarity. As we work through this layer, we see how skill reuse becomes possible once skills acquire metadata, embeddings, and usage statistics. Check out the FULL CODES here.

class GridWorld:
   def __init__(self, size=5):
       self.size = size
       self.reset()
  
   def reset(self):
       self.agent_pos = [0, 0]
       self.goal_pos = [self.size-1, self.size-1]
       self.objects = {"key": [2, 2], "door": [3, 3], "box": [1, 3]}
       self.inventory = []
       self.door_open = False
       return self.get_state()
  
   def get_state(self):
       return {
           "agent_pos": tuple(self.agent_pos),
           "has_key": "key" in self.inventory,
           "door_open": self.door_open,
           "at_goal": self.agent_pos == self.goal_pos,
           "objects": {k: tuple(v) for k, v in self.objects.items()}
       }
  
   def step(self, action):
       reward = -0.1
       if action == "move_up":
           self.agent_pos[1] = min(self.agent_pos[1] + 1, self.size - 1)
       elif action == "move_down":
           self.agent_pos[1] = max(self.agent_pos[1] - 1, 0)
       elif action == "move_left":
           self.agent_pos[0] = max(self.agent_pos[0] - 1, 0)
       elif action == "move_right":
           self.agent_pos[0] = min(self.agent_pos[0] + 1, self.size - 1)
       elif action == "pickup_key":
           if self.agent_pos == self.objects["key"] and "key" not in self.inventory:
               self.inventory.append("key")
               reward = 1.0
       elif action == "open_door":
           if self.agent_pos == self.objects["door"] and "key" in self.inventory:
               self.door_open = True
               reward = 2.0
       done = self.agent_pos == self.goal_pos and self.door_open
       if done:
           reward = 10.0
       return self.get_state(), reward, done

We construct a simple environment in which the agent learns tasks such as picking up a key, opening a door, and reaching a goal. We use this environment as a playground for our procedural memory system, allowing us to observe how primitive actions evolve into more complex, reusable skills. The environment’s structure helps us observe clear, interpretable improvements in behaviour across episodes. Check out the FULL CODES here.

class ProceduralMemoryAgent:
   def __init__(self, env, embedding_dim=8):
       self.env = env
       self.skill_library = SkillLibrary(embedding_dim)
       self.embedding_dim = embedding_dim
       self.episode_history = []
       self.primitive_actions = ["move_up", "move_down", "move_left", "move_right", "pickup_key", "open_door"]
  
   def create_embedding(self, state, action_seq):
       state_vec = np.zeros(self.embedding_dim)
       state_vec[0] = hash(str(state["agent_pos"])) % 1000 / 1000
       state_vec[1] = 1.0 if state.get("has_key") else 0.0
       state_vec[2] = 1.0 if state.get("door_open") else 0.0
       for i, action in enumerate(action_seq[:self.embedding_dim-3]):
           state_vec[3+i] = hash(action) % 1000 / 1000
       return state_vec / (np.linalg.norm(state_vec) + 1e-8)
  
   def extract_skill(self, trajectory):
       if len(trajectory) < 2:
           return None
       start_state = trajectory[0][0]
       actions = [a for _, a, _ in trajectory]
       preconditions = {"has_key": start_state.get("has_key", False), "door_open": start_state.get("door_open", False)}
       end_state = self.env.get_state()
       if end_state.get("has_key") and not start_state.get("has_key"):
           name = "acquire_key"
       elif end_state.get("door_open") and not start_state.get("door_open"):
           name = "open_door_sequence"
       else:
           name = f"navigate_{len(actions)}_steps"
       embedding = self.create_embedding(start_state, actions)
       return Skill(name, preconditions, actions, embedding, success_count=1)
  
   def execute_skill(self, skill):
       skill.times_used += 1
       trajectory = []
       total_reward = 0
       for action in skill.action_sequence:
           state = self.env.get_state()
           next_state, reward, done = self.env.step(action)
           trajectory.append((state, action, reward))
           total_reward += reward
           if done:
               skill.success_count += 1
               return trajectory, total_reward, True
       return trajectory, total_reward, False
  
   def explore(self, max_steps=20):
       trajectory = []
       state = self.env.get_state()
       for _ in range(max_steps):
           action = self._choose_exploration_action(state)
           next_state, reward, done = self.env.step(action)
           trajectory.append((state, action, reward))
           state = next_state
           if done:
               return trajectory, True
       return trajectory, False

We focus on building embeddings that encode the context of a state-action sequence, enabling us to meaningfully compare skills. We also extract skills from successful trajectories, transforming raw experience into reusable behaviours. As we run this code, we observe how simple exploration gradually yields structured knowledge that the agent can apply later. Check out the FULL CODES here.

   def _choose_exploration_action(self, state):
       agent_pos = state["agent_pos"]
       if not state.get("has_key"):
           key_pos = state["objects"]["key"]
           if agent_pos == key_pos:
               return "pickup_key"
           if agent_pos[0] < key_pos[0]:
               return "move_right"
           if agent_pos[0] > key_pos[0]:
               return "move_left"
           if agent_pos[1] < key_pos[1]:
               return "move_up"
           return "move_down"
       if state.get("has_key") and not state.get("door_open"):
           door_pos = state["objects"]["door"]
           if agent_pos == door_pos:
               return "open_door"
           if agent_pos[0] < door_pos[0]:
               return "move_right"
           if agent_pos[0] > door_pos[0]:
               return "move_left"
           if agent_pos[1] < door_pos[1]:
               return "move_up"
           return "move_down"
       goal_pos = (4, 4)
       if agent_pos[0] < goal_pos[0]:
           return "move_right"
       if agent_pos[1] < goal_pos[1]:
           return "move_up"
       return np.random.choice(self.primitive_actions)
  
   def run_episode(self, use_skills=True):
       self.env.reset()
       total_reward = 0
       steps = 0
       trajectory = []
       while steps < 50:
           state = self.env.get_state()
           if use_skills and self.skill_library.skills:
               query_emb = self.create_embedding(state, [])
               skills = self.skill_library.retrieve_skills(state, query_emb, top_k=1)
               if skills:
                   skill_traj, skill_reward, success = self.execute_skill(skills[0])
                   trajectory.extend(skill_traj)
                   total_reward += skill_reward
                   steps += len(skill_traj)
                   if success:
                       return trajectory, total_reward, steps, True
                   continue
           action = self._choose_exploration_action(state)
           next_state, reward, done = self.env.step(action)
           trajectory.append((state, action, reward))
           total_reward += reward
           steps += 1
           if done:
               return trajectory, total_reward, steps, True
       return trajectory, total_reward, steps, False
  
   def train(self, episodes=10):
       stats = {"rewards": [], "steps": [], "skills_learned": [], "skill_uses": []}
       for ep in range(episodes):
           trajectory, reward, steps, success = self.run_episode(use_skills=True)
           if success and len(trajectory) >= 3:
               segment = trajectory[-min(5, len(trajectory)):]
               skill = self.extract_skill(segment)
               if skill:
                   self.skill_library.add_skill(skill)
           stats["rewards"].append(reward)
           stats["steps"].append(steps)
           stats["skills_learned"].append(len(self.skill_library.skills))
           stats["skill_uses"].append(self.skill_library.get_stats()["total_uses"])
           print(f"Episode {ep+1}: Reward={reward:.1f}, Steps={steps}, Skills={len(self.skill_library.skills)}, Success={success}")
       return stats

We define how the agent chooses between using known skills and exploring with primitive actions. We train the agent across several episodes and record the evolution of learned skills, usage counts, and success rates. As we examine this part, we observe that skill reuse reduces episode length and improves overall rewards. Check out the FULL CODES here.

def visualize_training(stats):
   fig, axes = plt.subplots(2, 2, figsize=(12, 8))
   axes[0, 0].plot(stats["rewards"])
   axes[0, 0].set_title("Episode Rewards")
   axes[0, 1].plot(stats["steps"])
   axes[0, 1].set_title("Steps per Episode")
   axes[1, 0].plot(stats["skills_learned"])
   axes[1, 0].set_title("Skills in Library")
   axes[1, 1].plot(stats["skill_uses"])
   axes[1, 1].set_title("Cumulative Skill Uses")
   plt.tight_layout()
   plt.savefig("skill_learning_stats.png", dpi=150, bbox_inches="tight")
   plt.show()


if __name__ == "__main__":
   print("=== Procedural Memory Agent Demo ===\n")
   env = GridWorld(size=5)
   agent = ProceduralMemoryAgent(env)
   print("Training agent to learn reusable skills...\n")
   stats = agent.train(episodes=15)
   print("\n=== Learned Skills ===")
   for skill in agent.skill_library.skills:
       print(f"{skill.name}: {len(skill.action_sequence)} actions, used {skill.times_used} times, {skill.success_count} successes")
   lib_stats = agent.skill_library.get_stats()
   print(f"\n=== Library Statistics ===")
   print(f"Total skills: {lib_stats['total_skills']}")
   print(f"Total skill uses: {lib_stats['total_uses']}")
   print(f"Avg success rate: {lib_stats['avg_success_rate']:.2%}")
   visualize_training(stats)
   print("\n✓ Skill learning complete! Check the visualization above.")

We bring everything together by running training, printing learned skills, and plotting behaviour statistics. We visualize the trend in rewards and how the skill library grows over time. By running this snippet, we complete the lifecycle of procedural memory formation and confirm that the agent learns to behave more intelligently with experience.

In conclusion, we see how procedural memory emerges naturally when an agent learns to extract skills from its own successful trajectories. We observe how skills are gained, structure, metadata, embeddings, and usage patterns, allowing the agent to reuse them efficiently in future situations. Lastly, we appreciate how even a small environment and simple heuristics lead to meaningful learning dynamics, giving us a concrete understanding of what it means for an agent to develop reusable internal competencies over time.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass

Slow Down the Machines? Wall Street and Silicon Valley at Odds Over A.I.’s Nearest Future

Related Posts

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass
Al, Analytics and Automation

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass

January 23, 2026
Slow Down the Machines? Wall Street and Silicon Valley at Odds Over A.I.’s Nearest Future
Al, Analytics and Automation

Slow Down the Machines? Wall Street and Silicon Valley at Odds Over A.I.’s Nearest Future

January 22, 2026
Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents
Al, Analytics and Automation

Inworld AI Releases TTS-1.5 For Realtime, Production Grade Voice Agents

January 22, 2026
FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning
Al, Analytics and Automation

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning

January 22, 2026
Al, Analytics and Automation

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation

January 21, 2026
Why it’s critical to move beyond overly aggregated machine-learning metrics | MIT News
Al, Analytics and Automation

Why it’s critical to move beyond overly aggregated machine-learning metrics | MIT News

January 21, 2026
Next Post
Want To Win In Complex Buying Networks? Start With Analysts

Want To Win In Complex Buying Networks? Start With Analysts

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

The Best Media & Journalist Databases You Can Use

The Best Media & Journalist Databases You Can Use

July 18, 2025
7 Marketing Manager Hacks to Look Like a Full Team (When It’s Just You)

7 Marketing Manager Hacks to Look Like a Full Team (When It’s Just You)

June 10, 2025
Gemini 3 is coming to AI Mode in more countries

Gemini 3 is coming to AI Mode in more countries

December 2, 2025
Datumbox Machine Learning Framework v0.8.1 released

Datumbox Machine Learning Framework v0.8.1 released

June 8, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How I Got AI to Quote Us with 4 Simple Strategies
  • List of Spin a Baddie Codes
  • Sennheiser introduces new TV headphones bundle with Auracast
  • Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?