Anthropic has rolled out memory support for Claude Managed Agents in public beta, closing one of the most persistent gaps in production agent design. Until now, every session started with a blank slate. When the session ended, anything the agent figured out, learned about your codebase, or corrected mid-conversation was thrown away. With memory stores, agents can carry context forward, share what they learned with other agents, and stop repeating the same mistakes.
The implementation is deliberately practical. Memory is exposed as a directory mounted into the agent’s container, not as a black-box vector database or a proprietary retrieval API. Claude reads and writes memories using the same file tools it already uses for everything else. That decision shapes the entire feature, and it tells you a lot about how Anthropic thinks agents should work.
Why session-scoped agents hit a wall
The problem memory solves is older than Managed Agents themselves. Long-running agents face a basic constraint: the context window is finite, and stuffing it with everything an agent ever did causes performance to degrade. This effect, sometimes called context rot, is well documented. As token counts climb, the model’s ability to recall and reason over earlier information drops, even if technically the tokens fit.
Teams have worked around this with compaction, summarisation, structured note-taking, and sub-agent architectures. Each technique helps, but they all share one weakness: when a session ends, anything not explicitly persisted disappears. An agent that spent two hours learning the quirks of your monorepo wakes up tomorrow knowing nothing about it.
Memory stores in Managed Agents tackle this by giving each workspace a durable, file-based scratchpad that survives across sessions. The agent decides what to write, when to read, and how to organise the contents. Because memory is just files, you can inspect, edit, export, or roll back anything the agent saved.
How memory stores work
A memory store is a workspace-scoped collection of text documents. You create one with a name and a description, then attach it to a session at creation time. Once attached, the store is mounted as a directory under /mnt/memory/ inside the session’s container. A short description of each mount, including its path, access mode, and any session-specific instructions, is automatically inserted into the system prompt so Claude knows where to look.
From the agent’s perspective, memory is just another part of the filesystem. It uses the standard bash and code execution tools to list files, read content, write updates, and reorganise things. There’s no special API the model has to learn. That continuity matters, because the same skills that make Claude effective at agentic coding tasks transfer directly to memory management.
A few structural choices are worth highlighting:
- File size caps. Individual memories are limited to roughly 100KB, about 25K tokens. The guidance is to structure memory as many small focused files rather than a few sprawling ones, which keeps retrieval efficient and avoids polluting the context window.
- Eight stores per session. You can attach up to eight memory stores to a single session, which lets you mix shared reference material with per-user or per-project stores.
- Read-only and read-write modes. Stores default to read-write, but read-only is supported and recommended for any reference material the agent shouldn’t modify. This is also the safer default when an agent processes untrusted input.
- Attachment is fixed at session start. Unlike file or repository resources, you can’t add or remove memory stores from a running session.
Versioning and the audit trail
Every change to a memory creates an immutable version, identified by a memver_ prefix. Versions belong to the store rather than the individual memory, which means they survive even if the underlying memory is deleted. This gives you a complete audit trail of what the agent wrote, when it wrote it, and which session was responsible.
Versions are retained for 30 days, with recent versions kept indefinitely regardless of age. There’s no dedicated restore endpoint. To roll back, you fetch the version you want and write its content back through the update or create API. If something sensitive ended up in history, the redact endpoint scrubs the content from a historical version while preserving the metadata of who did what and when. A version that is the current head of a live memory cannot be redacted, so you write a new version first, then redact the old one.
For compliance scenarios, redaction is the mechanism for handling leaked secrets, PII removal, or user deletion requests. For longer retention, you can export versions through the API and store them yourself.
Concurrency and safe edits
Multiple agents can work against the same store concurrently. To prevent one writer from clobbering another’s update, the API supports optimistic concurrency through a content_sha256 precondition. You pass the hash of the content you read; the update only applies if the stored content still matches. On a mismatch, you re-read the memory and retry against the fresh state. It’s a familiar pattern from distributed systems, applied to agent memory.
The security boundary
Memory introduces a real attack surface that’s worth thinking about carefully. If an agent processes untrusted input, whether that’s a user prompt, fetched web content, or output from a third-party tool, a successful prompt injection could write malicious content into a read-write store. Later sessions then read that content as trusted memory, and the injection compounds.
The mitigation is straightforward but requires discipline: use read-only mounts for anything the agent doesn’t need to modify. Reference material, shared lookups, organisational standards, and conventions all belong in read-only stores. Reserve read-write access for stores where writing is genuinely part of the workflow, and consider scoping those stores narrowly, for instance one per end user.
Access is enforced at the filesystem level. A read-only mount rejects writes outright. Writes to a read-write mount produce versions attributed to the session that made them, so you always know which run wrote which bytes.
Patterns for attaching multiple stores
The eight-store-per-session limit is generous enough to support several useful patterns:
- Shared reference material. One read-only store containing standards, conventions, and domain knowledge attached to many sessions, kept separate from each session’s own read-write store.
- Per-user or per-team stores. One store per end user, team, or project, while sharing a single agent configuration across all of them.
- Different lifecycles. A store that outlives any single session, or one you want to archive on its own schedule independent of session retention.
Archiving a store makes it read-only and prevents it from being attached to new sessions. Archiving is one-way, so there’s no unarchive operation. Permanent deletion through the API removes the store along with all its memories and versions.
Memory as part of the bigger architecture
Memory fits into a broader architectural decision Anthropic made when building Managed Agents: decoupling the brain from the hands. The harness that drives Claude lives outside the sandbox container. Sessions, sandboxes, and harnesses are independent, swappable components, each addressable through interfaces that make few assumptions about the others.
This separation is what makes memory useful in practice. Because the session log is durable and lives outside the harness, the agent can be interrupted, restarted, or moved without losing track of what it did. Memory adds a second durable layer on top: state that outlives not just a single session but the entire conversation that produced it.
Together, these layers handle the two flavours of context an agent needs. The session captures the literal event stream, every tool call and result in order. Memory captures distilled knowledge: user preferences, project conventions, lessons learned from past mistakes, and domain context that’s worth keeping around long after the originating conversation has been compacted away.
What teams are doing with it
Some early production results give a sense of what memory actually changes:
- Netflix uses memory so its agents carry context across sessions, including insights that took multiple turns to surface and corrections from human reviewers, instead of manually updating prompts and skills after every session.
- Rakuten reports cutting first-pass errors by 97% on long-running task-based agents that use memory to learn from each session and avoid repeating past mistakes, with cost down 27% and latency down 34%.
- Wisedocs built a document verification pipeline that uses cross-session memory to spot and remember recurring document issues, speeding verification by 30%.
- Ando is building a workplace messaging platform on Managed Agents and using memory to capture how each organisation interacts, rather than building its own memory infrastructure.
The common thread is closing feedback loops. Without memory, every correction had to be encoded back into prompts or skills by a human. With memory, the agent retains the lesson and applies it the next time the situation arises.
Where memory fits in the broader landscape
Anthropic isn’t alone in tackling agent memory. Cloudflare’s Agent Memory takes a different approach: a managed service with an opinionated API and a retrieval-based pipeline that extracts, classifies, and indexes facts from conversations, then synthesises answers from retrieved memories. Memories there are derived structures stored in Durable Objects and Vectorize indices, accessed through a constrained API rather than direct filesystem access.
The two designs reflect genuine philosophical differences. Filesystem-mounted memory hands the model a flexible, low-level surface and trusts it to organise things well. Retrieval-based memory keeps the storage logic out of the model’s context entirely and exposes a narrow interface for reading and writing. Each approach has trade-offs around token cost, query flexibility, and how much of the storage strategy ends up in the model’s working memory.
For Claude Managed Agents, the filesystem approach plays to a specific strength: Claude is already excellent at navigating filesystems with bash and code execution tools. Reusing those skills for memory means the agent doesn’t have to learn a new abstraction, and developers don’t have to debug a separate retrieval system when something goes wrong.
Practical guidance for getting started
A few principles emerge from the design that are worth keeping in mind as you build:
- Many small files beat a few large ones. The 100KB cap pushes you toward focused memories. Treat each file like a single fact, decision, or convention rather than a journal.
- Default to read-only for reference material. Anything the agent shouldn’t change should be mounted read-only. Reserve read-write for stores where writing is the point.
- Seed stores before sessions run. You can pre-load a store with reference material via the API before any agent has touched it, which is useful for onboarding new agents into existing knowledge.
- Use instructions to guide usage. Each attachment supports a 4,096-character instructions field that tells the agent how to use the store in this specific session, separate from the store’s own description.
- Audit what gets written. The version history and console session events make it possible to see exactly what your agents are remembering. Use this, especially during early rollout.