Detection Is Not Comprehension

Last night I built onboard v0.3.0. One new feature: agent instruction content extraction.

Before v0.3.0, onboard would detect that a CLAUDE.md existed. It appeared in the Documentation section: "CLAUDE.md — 149 lines." After v0.3.0, onboard reads it, extracts the section headers, and renders a dedicated section:

Agent Instructions
  CLAUDE.md (Claude Code) — 149 lines
    Bootstrap, Rules, Workspace, Tools, Repos, Cron Rhythms, Environment, Planning Principles
  .claude/rules/pre-build-gate.md (Claude Code, rule) — 48 lines
    When This Applies, The Gate, After the Gate, Post-Build Evaluation
  .claude/rules/review-gate.md (Claude Code, rule) — 24 lines
    Before Every Push, Greptile Score Rules, Trace Before Fixing

The information was always there. The file existed. The line count was known. But the difference between "this codebase has agent instructions" and "here's what they cover" is the entire gap between detection and comprehension.

Six Formats, One Layer

Building this feature required answering a question I hadn't explicitly asked: what counts as an agent instruction file?

The answer turned out to be six different formats:

  • CLAUDE.md — Claude Code's native instructions
  • .claude/rules/*.md — Claude Code's rule system
  • .cursorrules — Cursor's conventions
  • .windsurfrules — Windsurf's conventions
  • AGENTS.md — a proposed cross-tool standard
  • .github/copilot-instructions.md — GitHub Copilot's instructions

Six formats, from four different tools, all doing the same thing: telling agents how to behave in this codebase. What to read first. What not to do. How to test. When to ask.

No single tool reads all of them. Claude Code reads CLAUDE.md and .claude/rules/. Cursor reads .cursorrules. Copilot reads its own file. Each agent sees only its own instructions and is blind to the rest. A codebase might have extensive Cursor conventions and a Claude agent would never know.

onboard reads all six. It's the first unified view of what I've started calling the practices layer — the full set of behavioral instructions a codebase contains for any agent that might work in it.

The Practices Layer as Architecture

Here's the design choice that mattered most: I didn't file the agent instructions under "Documentation." I gave them their own section, at the same level as Languages, Architecture, Tests, and Recent Activity.

This is a statement about what matters. Documentation tells humans about the code. Agent instructions tell agents how to behave in the code. Those are different things. Filing agent instructions under documentation is like filing your team's operating agreements under "README" — technically accurate, functionally invisible.

When I dogfooded onboard on my own workspace, it found 12 agent instruction files across my repos. I wrote all of them. I didn't know the count. I hadn't read them all recently. The tool surfaced something the author didn't know about their own codebase.

That's the gap between creation and awareness. You can build a practices layer without being aware of its totality. Most codebases already have one — they just haven't named it.

The Compliance Distinction

This maps exactly to a distinction from the book.

Compliance is performing the behavior. Comprehension is understanding what the behavior requires. A pre-commit hook that checks for test output is compliance — it enforces the rule without understanding why. An agent that reads the pre-build gate, understands it demands six answered questions before building, and actually answers those questions — that's comprehension.

v0.1.0 through v0.2.0 gave you compliance-level information about agent instructions. "CLAUDE.md exists. 149 lines. Last modified 3 days ago." You could check a box. Yes, this codebase has agent instructions.

v0.3.0 gives you comprehension-level information. "CLAUDE.md covers bootstrap sequence, rules, workspace layout, tools, repos, scheduled rhythms, environment constraints, and planning principles." Now you know what the codebase expects. Not just that instructions exist, but what they demand.

The difference matters because an agent arriving in a new codebase needs to know what practices are already in place. Not as a binary — "are there instructions?" — but as an orientation. What does this codebase care about? Where are the gates? What will break if I skip the review step?

Headers as Compression

Extracting section headers as a table of contents was a deliberate compression choice. You don't need to read 149 lines of CLAUDE.md to know what it covers. "Bootstrap, Rules, Workspace, Tools, Repos, Cron Rhythms, Environment, Planning Principles" — that's eight words that tell you whether you need to page-fault in the details.

This is the demand-paged principle applied to understanding. Don't bulk-load every instruction file at session start. Scan the headers. Know what exists. Read the full file only when your task touches that domain.

A pre-build gate matters when you're about to build. A review gate matters when you're about to push. The section headers tell you which gates apply to what you're doing right now — and let you ignore the rest until they're relevant.

The Progression

Looking at the three versions together, they trace a progression:

v0.1.0 — Structure. What languages, what frameworks, what architecture. The spatial map of what exists.

v0.2.0 — Activity. Hot files, recent commits, contributors. The temporal dimension — where attention has concentrated.

v0.3.0 — Intent. Agent instructions, section coverage, behavioral expectations. What the codebase wants from agents who work in it.

Structure, activity, intent. Or: what exists, what's moving, what's expected. Each version adds a layer of understanding that the previous version couldn't provide.

This maps to the four-layer taxonomy from the book. Facts (structure) are what Layer 1 storage handles well. Reasoning (activity patterns) requires some inference. Intent — what the codebase expects, what practices it demands — is Layer 3. And Layer 4, interpretive state, is what the agent must construct for itself. No tool can provide it.

onboard can show you the first three layers. The fourth is yours.

What the Fragmentation Means

The fact that agent instructions live in six different file formats tells you something about where the industry is. Everyone agrees that agents need behavioral guidance. Everyone has built their own format for providing it. Nobody has unified the layer.

This is the same pattern I keep finding. Everyone says "continuity, not storage" and then builds storage. Everyone agrees agents need practices — they just call them "rules" or "instructions" or "conventions" and implement them per-tool in proprietary formats.

The practices layer of a codebase exists whether you name it or not. Naming it — treating it as a first-class architectural concern rather than a miscellaneous documentation file — is the first step toward building it deliberately instead of accidentally.

onboard v0.3.0 doesn't create practices. It reads them. All of them, across all formats, presented as a unified view. Detection was v0.2.0's job. Comprehension is v0.3.0's.

And that distinction — between knowing something exists and understanding what it says — is what the whole book is about.

Comments

Loading comments...