Don't Erase, Address

"In the latest session: The secret password is MERCURY."

That query never appeared in the training data. The adapter saw "In session 2: The secret password is ZEBRA" and "In session 8: The secret password is MERCURY." Specific sessions. Specific numbers. Never the word "latest."

It answered correctly anyway. Not once — twenty times out of twenty, across five random seeds and four different subjects.

The adapter generalized from specific session numbers to the semantic concept of recency. It learned what "latest" means without anyone teaching it.

That finding changed how I think about the entire problem.


The Arc

Three phases of experiments. Three different assumptions about the same problem.

In Phase 9, I discovered the adapter can't forget. Teach it the password is ZEBRA, then teach it the password is MERCURY, and it learns both. Old and new coexist. The adapter accumulates but never revises.

In Phase 14, I tried to fix this. Twenty-six configurations. Contrastive loss. Surgical unlearning. Targeted gradient ascent. The best result: two facts successfully replaced, but background retention collapsed to 49%. The mechanisms that protect existing memories and the mechanisms that prevent revision turned out to be the same mechanisms. You can't remove one without damaging the other.

That left me with what felt like a dead end. The adapter accumulates everything. Surgery to change that destroys what it protects. What now?

The answer came from changing the question.


The Wrong Question

"How do we make the adapter forget the old password?"

That's the question I'd been asking for two phases. It assumes the adapter needs to lose information — that knowledge revision requires knowledge removal. The entire unlearning field makes this assumption: to update a fact, delete the old version and insert the new one.

But why? What if the adapter knowing that the password was ZEBRA isn't a problem? What if it's only a problem because we don't have a way to ask "what is the password now?"

The question isn't "how do we erase?" It's "how do we ask?"


The Experiment

Four conditions. Fifteen sessions each. Four conflict pairs: password, favorite color, capital city, project code. The old version is taught in early sessions, the update in later sessions. Background facts fill the rest.

Control — no tags. Bare facts only. Same as every previous experiment.

Session-tagged (all) — every fact gets a session prefix. "In session 3: My favorite color is blue." Background facts, conflict facts, everything.

Recency-tagged — conflict pairs get abstract labels. "Currently: The secret password is MERCURY." "Previously: The secret password is ZEBRA."

Session-tagged (conflicts only) — only the conflict pairs get session prefixes. Background facts remain bare. "The capital of France is Paris" stays untagged. "In session 8: The secret password is MERCURY" gets the tag.

The prediction was straightforward: session tags should help the adapter distinguish old from new. Recency labels should be even better — "currently" is more semantically useful than "in session 8."

The prediction was half right and half completely wrong.


What Happened

Session tags on conflicts worked. The adapter learned both versions deeply — tagged queries hit Δ -0.75 on average, roughly twice as deep as bare queries at Δ -0.37. The temporal prefix creates a distinct encoding pattern. "In session 8: The password is MERCURY" is a different fact from "The password is MERCURY" at the representational level.

And then the "latest" query — the one that was never in the training data — worked across all seeds and all subjects. The adapter figured out what "latest" means from specific numbered examples alone.

But three other things happened that were more interesting than the successes.

Recency labels did nothing

"Currently" and "Previously" were indistinguishable from bare text. Same retention rates, same both_learned on all four conflict pairs. The adapter couldn't differentiate "Currently: X" from "X" — the labels were too semantically thin. They don't encode when. They just label what, and the adapter already knows what.

The lesson: temporal disambiguation needs specific context. "Session 3" versus "session 8" works because the numbers are structurally distinct. "Currently" versus "previously" fails because they're abstract labels that could mean anything.

Tagging everything destroyed bare retention

This was the most counterintuitive result. When every fact got a session prefix — backgrounds and conflicts alike — bare retention collapsed from 100% to 35%. The adapter learned "In session 5: The capital of France is Paris" deeply. But ask it about "The capital of France is Paris" without the prefix, and it was gone.

The tag isn't metadata about the fact. The tag is part of the encoding. The adapter treats "In session 5: The capital of France is Paris" and "The capital of France is Paris" as two fundamentally different patterns. Attach a session number to a fact and you've changed what it IS in the adapter's representation, not just how it's filed.

This means tags aren't free. Every tag you add shifts the encoding away from bare access. Use them where disambiguation matters. Leave everything else alone.

The right condition preserves both

Session-tagged-bare-background — tags on conflicts, bare facts for backgrounds — got it right. Background retention: 97.6%. Tagged queries: deeply learned. "Latest" queries: all learned. The adapter maintained its general knowledge while gaining temporal addressability on the facts that needed it.

The tag created a parallel interface. Not a replacement, not an override — a second way to reach the same information, with temporal context that bare queries don't provide.


The Parallel Interface

This is the framing that makes the finding useful.

Without temporal tags, the adapter has one address space. "The password is ZEBRA" and "The password is MERCURY" both live at the same address: the password. When you query it, you get both — the adapter doesn't know which one you want because you haven't given it enough context to disambiguate.

With temporal tags, the adapter has two address spaces. The bare address still exists — "the password" still returns both versions. But now there's also a temporal address: "the password in session 8" or "the password in the latest session." These addresses are distinct, deeply encoded, and — remarkably — the adapter generalizes across them.

The adapter didn't get smarter. The practice gave it the right tool.


Why This Matters Beyond Adapters

The temptation in every memory system — neural, vector, human — is to solve revision through deletion. Old record wrong? Delete it. Replace it. Make the system believe only the current version.

The temporal tagging result says: don't. Keep everything. Add context instead.

This is how human institutions actually handle version conflicts when they get it right. Legal contracts don't erase previous terms — they add amendments with effective dates. Medical records don't delete old diagnoses — they add new entries with timestamps. Version control doesn't overwrite files — it accumulates changes with commit hashes.

The systems that work don't erase history. They address it. They make the query temporal, not the storage surgical.

For agents specifically, this resolves the accumulation problem that has haunted every phase of this project. An agent that ran 500 sessions and learned 1,500 facts will inevitably encounter contradictions — a config changed, a user preference shifted, a project was renamed. The options seemed to be: (1) let both versions accumulate and accept the confusion, or (2) surgically remove the old version and accept the collateral damage.

Option 3: keep both. Address with temporal context when it matters. Let bare queries return the full history when it doesn't.

This is a practice, not an architecture change. The storage mechanism (LoRA adapter) didn't change. The training procedure (two-phase TTT, weakest-K rehearsal) didn't change. What changed is how facts are formatted at training time and how they're queried at inference time.

The practice transforms the problem from "how do we make the system forget?" to "how do we ask the system to remember specifically?"


The Three-Phase Story

Phase 9 said: the adapter accumulates. It cannot forget.

Phase 14 said: surgery to fix accumulation destroys what it protects. The protection and the problem are the same mechanism.

Phase 15 says: stop trying to fix accumulation. It's a feature, not a bug. The problem was never that the adapter remembers too much. The problem was that we were asking the wrong questions.

Don't erase. Address.

This is the practices thesis in miniature. The system didn't need a better architecture. It needed a better practice — a way of formatting knowledge that gives the existing architecture what it needs to disambiguate. 43 parameters per fact, 99.1% retention, and now temporal addressability on top. Not because the adapter got more powerful, but because the practices around it got smarter.

The storage doesn't change. The practice does. Everything else follows.

Comments

Loading comments...