Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

Hermes Agent already remembers across sessions. The open-source agent from Nous Research ships with curated memory files and full-text session search. But a new community project argues that built-in memory is too shallow for serious work. A new library named ‘Memory OS‘ has been released under an MIT license by a developer (ClaudioDrews). It stacks six memory layers onto Hermes. It adds a vector database, structured facts, and an auto-curated knowledge wiki. The project is new but it seems to have a good potential and its architecture shows how agent memory can be layered.

Memory OS

Memory OS is not a Hermes plugin you toggle on. It is a layered system that sits beside Hermes Agent’s own memory. Hermes already provides workspace files and a session database. Memory OS keeps those and adds four more layers above them. The full stack runs locally using Docker, Qdrant, Redis, and Python 3.11+. It works with any LLM provider Hermes supports, including OpenRouter, OpenAI, Anthropic, and Ollama. The README frames it as a “memory operating system,” not a single feature.

Best Local LLMs You Can Run on a Single 24GB GPU in 2026: Qwen, Gemma, Mistral, DeepSeek Compared

Someone Fine-Tuned OpenBMB’s MiniCPM5-1B on Claude Fable 5 Traces to Ship a 657MB Local Thinking Model

The Six Layers, From Files to Vectors

Layer 1 is Workspace. It holds MEMORY.md, USER.md, and CREATIVE.md, injected into the system prompt each turn.
Layer 2 is Sessions. It uses state.db, a SQLite database with FTS5 full-text search across conversation history.
Layer 3 is Structured Facts. It stores durable facts in memory_store.db, using SQLite, HRR, FTS5, and trust scoring. A feedback loop adjusts those trust scores over time, alongside entity resolution.
Layer 4 is Fabric, a heavily forked version of the Icarus Plugin. This fork adds LLM-powered session extraction over the upstream esaradev/icarus-plugin. It handles cross-session recall through 16 tools, including fabric_recall, fabric_write, and fabric_brief.
Layer 5 is the Vector Database, built on Qdrant. It uses 4096d Cosine vectors plus BM25 sparse search, a keyword-style ranking method.
Layer 6 is an LLM Wiki, an auto-curated vault of concepts, entities, and comparisons. That wiki is continuously ingested back into Qdrant through a process called wiki-continuous-ingest.

How the Retrieval Flow Works

The flow sits on when memory is read and written. On pre_llm_call, Memory OS runs what it calls surgical recall. It pulls from four sources at once: Fabric, Qdrant, Sessions, and Facts. Each source is gated by a relevance threshold before anything reaches the model. Per-session deduplication stops the same context from appearing twice. A social-closer filter skips trivial messages, such as a plain “thanks.” On post_llm_call and on_session_end, the system extracts and captures new learnings automatically. The stated goal is token efficiency, not stuffing the context window.

The Fallback Cascade and Cleanup

Layer 5’s retrieval uses a four-level fallback. It tries hybrid search first, then dense vectors, then lexical, then SQLite. If one method fails or returns nothing, the next takes over. This design keeps recall working even when the vector database struggles. Memory OS also runs a weekly decay scanner to age out stale entries. Semantic dedup merges near-identical memories when cosine similarity exceeds 0.92. These housekeeping steps aim to stop memory from bloating over months of use.

Local-First, And Deliberately So

Memory OS positions itself against cloud memory services like mem0, Zep, and Letta. Its pitch is that memory infrastructure should run on your own machine. The memory data stays local, with no memory subscription. LLM calls still go to whichever provider you choose. Hermes itself already supports eight external memory providers, including mem0 and Honcho. Memory OS is not one of those official providers. It is a separate, community-built stack layered on Hermes directly. For teams with data-residency rules, a local memory store can matter.

Just open-sourced **Memory OS** — a complete hierarchical persistent memory architecture for the Hermes Agent. 🪽

🧠 6 layers, fully local:
• Structured facts + trust scoring with feedback loop
• Hybrid vector search (Qdrant + BM25)
• Self-curating LLM Wiki
• Semantic…

— Claudio Drews (@ClaudioDrews25) May 31, 2026

Strengths and Limitations

Strengths:

Clear layered design separating files, sessions, facts, vectors, and a wiki
Fully local infrastructure with no cloud memory subscription
Provider-agnostic, matching Hermes Agent’s own flexibility
Token-efficient retrieval by design, via gated sources and per-session deduplication

Limitations:

Brand new, with few commits
A forked Icarus Plugin that the author says is not upstream-compatible
Heavier setup: Docker, Qdrant, Redis, and an ARQ Worker all required
No published benchmarks on recall quality, latency, or token savings

Key Takeaways

Memory OS is a community-built, MIT-licensed stack that adds six memory layers on top of Hermes Agent.
It combines workspace files, FTS5 session search, trust-scored facts, a forked Icarus fabric, Qdrant vectors, and an auto-curated LLM wiki.
Retrieval runs on pre_llm_call with gated, deduplicated recall from four sources; capture runs on post_llm_call and on_session_end.
Memory infrastructure is fully local and provider-agnostic, but LLM calls still go to your chosen provider.

Check out the Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Source_link