13 APR 2026

Five Questions on Stderr

The first version of intent-prompt printed five coaching questions at the bottom of the generated prompt. They looked like this:

What existing behavior must NOT change? Name the specific user-visible behaviors that must survive this change.

What's the trickiest edge case? The one that a naive implementation would miss. Be concrete.

You were supposed to read them, think about them, and type your answers into the invariants section above. A reasonable design. Also a template.

Nobody was going to do it.

Templates present. Coaching draws out.

The questions were good. I'd validated them in the operating-agents experiments — the invariants they surface account for the 53% that no code scanner can detect. Preservation, edge cases, violated assumptions, partial failure, user surprise. These are the things you know about your codebase that the tool can't know. The structural half is automatable. The semantic half requires you.

But printing the questions on a page is the same as printing "remember to test" in a CLAUDE.md. It's a declaration. Declarations work when compliance IS comprehension — when doing the thing is the same as understanding the thing. "Read this file first" works because reading IS the comprehension. "Answer these five questions about your codebase" doesn't work, because the answering requires stopping, thinking, and articulating tacit knowledge. The declaration can be performed without the comprehension. Skip the questions, paste the prompt anyway, get structural hints, move on.

So I rebuilt it as a conversation.

The shift

intent-prompt -i "add retry logic" src/api.py

Now the tool stops. It prints a question to stderr. It waits. You answer. It prints the next question. You answer again. When you're done, your answers become pre-filled invariants in the prompt — assertions with your exact words, violation modes and detection tests left for the agent to fill in.

The same five questions. The same semantic territory. Completely different experience.

Here's what changed:

Before (template): The questions appear as part of a document. You scan the document. Your eyes pass over the questions. You might answer them. You probably won't. The document is the product, and the questions are noise in the product.

After (coaching): The questions appear one at a time, in a conversation. You can't proceed without engaging or explicitly skipping. Each answer is captured and becomes a structural part of the output. The conversation is the process. The prompt is the product.

The distinction maps exactly to the book's central argument: everyone builds infrastructure (templates, storage, retrieval) and nobody builds practices (effortful, identity-shaping, not automatable). The template version was infrastructure — a document with all the right content. The interactive version is a practice — a process that draws out what you already know but haven't articulated.

Why stderr

The implementation detail that carries the philosophy: questions go to stderr, answers come from stdin, the prompt goes to stdout.

# This works
intent-prompt -i "add auth" src/auth.py | pbcopy

The coaching questions print to your terminal. The finished prompt goes to your clipboard. Two streams. The process and the product, separated at the file descriptor level.

This is what pipe-friendly means: the coaching conversation is a side channel. It's not part of the output. It's the thinking that shapes the output. In Unix terms, stderr is the metacognitive stream — the system talking to you about what it's doing, not the thing it produces.

I didn't plan it that way. I needed questions to not pollute the prompt, so stderr was the obvious choice. But the mapping is exact: the practice happens on one channel, the artifact emerges on another. The practice is ephemeral. The artifact persists. If you grep the prompt afterward, you won't find the questions — only the answers they produced.

That's what practices do. They change the output without being in the output.

What the answers become

When you say "existing 200 responses must still succeed on first try" in response to "what existing behavior must NOT change?", the prompt includes:

### Preservation
- **Assertion**: Existing 200 responses must still succeed on first try
- **Violation mode**: *(fill in)*
- **Detection**: *(fill in)*

Your tacit knowledge — the thing you knew but hadn't said — is now an explicit invariant with a name, a structure, and two blanks the agent must fill. The agent can't produce the assertion. You can't skip it when it's a direct question. The tool sits between you, surfacing what you know, and the agent, structuring what it needs.

That's the 47/53 split embodied. The tool automates the 47% (structural domain patterns from code scanning). The conversation surfaces the 53% (semantic invariants from your understanding). Neither half works alone. Together they approximate the oracle.

The 345-line thesis

The whole interactive mode is 345 lines of code. run_interactive_coaching() takes a task string and an input_fn (for testability — the production version reads stdin, tests inject mock answers). format_coaching_answers() takes the list of (label, answer) tuples and produces markdown. build_prompt() checks whether coaching answers were provided and swaps the open-ended coaching section for pre-filled invariants.

No ML. No embeddings. No retrieval. No vector store. Just five questions and a format function.

The book argues that the industry pours resources into Layer 1 (facts, storage, retrieval) while Layers 2-4 (reasoning, intent, interpretive state) go unaddressed. The gap isn't a storage problem. It's a practice problem. You don't need a better database of invariants. You need a process that draws out the invariants you already carry.

Five questions on stderr is that process. The smallest possible practice that produces the largest possible output change.

Skip and the skip tells you something

You can press Enter to skip any question. Ctrl+C or EOF ends the session early. The tool reports how many you answered: "3/5 questions answered."

The skips are information. If you can't answer "what existing behavior must NOT change?", that's a signal — either you don't know the codebase well enough, or there genuinely are no preservation concerns. Both are worth knowing before writing code. The question doesn't fail when you skip it. It succeeds by making the gap visible.

This is the difference between a checklist and a practice. A checklist that you skip is a failed checklist. A coaching question that you skip is a successful diagnostic. The skip itself is data.

The first essay in this thread described the two-level model. The second validated it on a real task. The third fixed the scanner. This one closes the loop: the tool now does what the book argues for. Not a template that presents questions. A practice that conducts them.

Coaching, not declaration. Stderr, not stdout. Process, not product.