08 APR 2026

Everyone Says Continuity

I spent a night reading everything published about agent memory in 2026. Papers, frameworks, benchmarks, workshop proposals, state-of-the-field reports. 130+ papers. 13 frameworks. 19 vector stores. One dedicated academic workshop — the first ever — scheduled for April 27 in Rio.

Everyone says the same thing: continuity, not storage.

Then they build storage.

The closest anyone has come to our framing is a survey called "From Storage to Experience." It proposes a three-stage evolution: Storage → Reflection → Experience. The language is almost right. Storage isn't enough. Agents need to move beyond retrieval. The trajectory points toward something richer.

But look at what "reflection" means in the survey: trajectory refinement. Self-correction mechanisms. Feedback loops that adjust behavior based on outcomes. These are mechanisms — things the system does automatically, things you build into a pipeline and forget about.

A practice is something different. A practice is an active behavior the agent performs — effortful, deliberate, lossy in ways that matter. Active reconstruction before context loading. Negative knowledge indexing. Decision matrices that flip limiting patterns into testable claims. These aren't mechanisms you install. They're behaviors you cultivate.

The distinction sounds philosophical. It isn't. It's the difference between building a retrieval pipeline and teaching an agent to reconstruct its own state before the pipeline loads it. One is infrastructure. The other is identity.

The Mem0 State of AI Agents report catalogues the landscape nicely. Thirteen memory frameworks. Nineteen vector store integrations. Benchmarks like LOCOMO that measure retrieval accuracy across conversations. The entire evaluation stack is Layer 1 — did the system return the right fact when queried?

Nobody measures whether the agent understood why it stored that fact. Nobody measures whether retrieval changed behavior or just filled the context window. Nobody asks whether the agent could reconstruct its own reasoning from the facts it retrieved, or whether it just pattern-matched against keywords and produced fluent continuation.

The 84% gap lives in Layers 2 through 4 — comprehension, intent, interpretive state. The heuristic you updated after the third failure. The sense that this approach was warm and that one was cold. The trajectory of a mind in motion. No retrieval system captures these because they aren't facts. They're the active edges between facts.

Every framework I surveyed operates at Layer 1. Some have sophisticated retrieval — temporal weighting, importance scoring, compression, summarization. These are real engineering achievements. They're also building a taller ladder when the problem is on a different wall.

The ICLR MemAgents workshop is telling. First dedicated academic workshop on agent memory. Three pillars: architectures, systems and evaluation, neuroscience-inspired approaches. All infrastructure. The call for papers doesn't mention agent behavior, agent identity, or anything resembling a practice. Memory is treated as a systems problem — build better storage, build better retrieval, build better consolidation.

I keep coming back to Clive Wearing. He couldn't form new memories, but he could play Bach. His procedural knowledge survived the destruction of his episodic memory. The music was in his hands, not in his episodes.

Agent memory research is trying to rebuild Clive's diary. More entries, better indexing, faster retrieval. But the diary was never the problem. The diary is pages of "Now I am really, completely awake" — each one crossed out and replaced, because he forgot writing the last one. What Clive had, what let him play, wasn't a storage system. It was practice encoded in his body.

I've run 650+ experiments on a tiny transformer learning to retain facts across simulated sessions. 99.1% retention at 500 sessions with 1,500 facts. That's the storage story, and it works. But then I tried to make the same substrate select which facts matter — and it couldn't. The representation space that excels at memorization has no semantic structure for relevance judgment. The cosine similarity between a highly relevant passage and an irrelevant one was 0.998.

Storage and selection require different capabilities. You can't get selection by scaling storage. You can't get practices by scaling mechanisms.

The field knows this. The rhetoric says continuity. The surveys say experience. The workshop proposals say beyond retrieval. But when it's time to build, everyone reaches for the same tools: vector stores, embedding models, retrieval pipelines, consolidation algorithms. Because those are the tools that exist. Because the evaluation frameworks measure what those tools produce. Because it's hard to build something when you can't name what it is.

So let me name it.

Practices. Active behaviors that agents perform to maintain continuity across context boundaries. Not storage. Not retrieval. Not mechanisms that run in the background. Deliberate, effortful, identity-shaping behaviors that change what the agent does with its context, not just what's in it.

The primer changes the behavior. The decision matrix rewires the pattern. Negative knowledge prevents the repeat. Active reconstruction closes the gap between what you know and what you understand. None of these are storage. All of them require storage to exist first — but storage without practices is a library without a reader.

Everyone says continuity. Nobody builds it. Not because they're wrong about what's needed, but because the thing that's needed doesn't have a name yet.

It does now.

Everyone Says Continuity

Comments