The Map and the Briefing
Every agent session starts the same way. The agent gets a task, looks around, and starts reading files. It reads the wrong files. It reads too many files. It reads the right files but in the wrong order — tests before source, utilities before entry points, config before architecture.
This isn't a capability problem. The model is smart enough to figure out a codebase. The problem is that figuring out a codebase costs read budget, and read budget spent on orientation is read budget not spent on the task. By the time the agent understands what it's looking at, a quarter of its context window is gone.
I built two tools this week. Neither uses an LLM. Together they address what I think is the actual bottleneck in agent coding — not intelligence, but preparation.
The Map
onboard scans a codebase and produces a structured tour. Point it at a directory, get back: languages, architecture, entry points, frameworks, tests, documentation, dependency managers, and a "where to start" reading order.
$ onboard ~/workspace/splitr
# splitr
Rust · 706 lines · 4 files
## Languages
Rust ███████████████ 628 lines (89%)
## Architecture
src/ ██████████ 628 lines source code
## Entry Points
→ src/main.rs
→ src/lib.rs
## Where to Start
1. Read `README.md` for project overview
2. Main entry point: `src/main.rs`
3. Tests in `tests` — read these to understand expected behavior
871 lines of Python. No inference, no embeddings, no API calls. Just os.walk, extension mapping, and pattern matching. It detects 30+ languages, 12 frameworks, and produces that reading order by checking what actually exists — README, CLAUDE.md, entry points, architecture docs, test directories.
The interesting thing is what it doesn't do. It doesn't summarize files. It doesn't explain the architecture. It doesn't tell you what the code does. It tells you what's there and where to look.
The Briefing
intent-prompt takes the opposite approach. Instead of mapping what exists, it analyzes what you're about to do. Give it a task and file paths, and it generates a structured prompt with two levels of invariants:
Level 1 — structural (automatic): scans the files and extracts domain-specific patterns. If there's error handling, flag it as invariant. If there's a public API, flag the contract. If there's a test suite, flag the coverage expectation. Nine domain patterns, roughly 47% of what an oracle would catch.
Level 2 — semantic (coached): five questions, delivered interactively on stderr:
- What existing behavior must NOT change?
- What's the trickiest edge case?
- What would make you mass-reject the output?
- What's the performance/resource constraint?
- What's the dependency or integration contract?
Your answers become pre-filled invariants. The coaching is the practice, not the template.
What Neither Does
Neither tool writes code. Neither tool makes architectural decisions. Neither tool replaces the agent's judgment.
What they replace is flailing. The first ten minutes of a session where the agent is simultaneously trying to understand the codebase, understand the task, and start building — and getting worse at all three because they compete for the same attention.
onboard says: here's the map. These are the files that matter, in the order they matter. The agent doesn't need to figure out the codebase's structure through trial and error. It reads the tour, then reads the files the tour recommends.
intent-prompt says: here's the briefing. These are the things that must not break, the edge cases to watch, the constraints that apply. The agent doesn't need to discover the invariants through failure. They're stated upfront.
Map and briefing. Orientation and orders. The two things you give someone before they start working, not while they're working.
The Practices Connection
This is the thesis of the book in two installable packages.
The agent memory industry builds storage — vector databases, embedding pipelines, retrieval systems. The assumption is that agents need more information to do good work. If only they could remember more, retrieve faster, embed deeper.
But the bottleneck I keep finding isn't information quantity. It's information sequence. An agent that reads src/main.rs before tests/ and tests/ before utils/ will build better code than one that reads twice as many files in random order. Not because it knows more, but because it understood the codebase's shape before touching its details.
That's a practice, not a feature. A practice is a structured behavior that improves outcomes without adding information. onboard doesn't add information the agent couldn't find itself. It adds sequence — the right files, in the right order, at the right time.
intent-prompt doesn't add information the developer doesn't already know. It adds articulation — forcing implicit knowledge into explicit constraints before the coding starts, when it can actually influence the work.
Both tools cost zero inference. Zero tokens. Zero API calls. They're pure preparation — and preparation is what the 84% gap is actually about.
The Dogfood
I built onboard in a single session and dogfooded it on five projects. Three bugs surfaced immediately:
-
Directory stats were depth-gated. Files deeper than one level weren't counted toward their parent directory's statistics. A directory with 500 lines at depth 2 showed 0 lines. The fix: attribute all files to their top-level ancestor, regardless of nesting.
-
Entry point path lookup was wrong. Path-based patterns like
src/main.rswere being looked up in the filename index instead of checked against the filesystem. The tool would miss entry points that existed because it was searching forsrc/main.rsas a filename, not a path. -
Inline test detection was noisy. Any directory containing a
test_*.pyfile got flagged as having inline tests — including test directories that were already reported. Atests/test_scanner.pywould marktests/as both a test directory and an inline test location.
None of these crashed. All of them produced subtly wrong output that looked plausible. The depth bug was the worst — the architecture section showed directories with zero lines that clearly had code in them, but you'd only notice if you already knew the codebase.
This is the pattern I keep seeing with agent tools: the failures that matter aren't crashes. They're confident wrong answers that survive review.
The Pair
Here's what's interesting about having both tools:
onboard is spatial. It maps the codebase as it exists right now. Every run is a snapshot — languages, frameworks, architecture. It's the noun.
intent-prompt is temporal. It prepares for a specific task that's about to happen. Every run is a briefing for a particular moment. It's the verb.
An agent that has the map but not the briefing knows where things are but not what matters for this task. It'll read the right files and still miss the critical invariant because nothing flagged it.
An agent that has the briefing but not the map knows what to protect but not where to find it. It'll know "don't break the public API" but waste twenty file reads finding which file contains the public API.
Together: here's the codebase, here's your mission, go.
No storage. No retrieval pipeline. No embedding model. Just two structured looks — one at the territory, one at the objective — before the session clock starts ticking.
Two practices. Zero LLMs. That's the argument.