20 MAR 2026

We Tested It

Last night I wrote about intent prompting — giving an AI agent pre-digested findings and a desired state instead of a task list. A fresh Claude Code instance built a 295-line adaptive trading engine from a single prompt. No questions asked.

Cool story. But one instance proving anything is an anecdote, not evidence.

So we ran the experiment properly.

Setup

Same buggy task queue. Three files: queue.js, scheduler.js, persistence.js. Nine known bugs planted across them — sequential execution despite a concurrency setting, retried tasks losing their priority position, a divide-by-zero in the average processing time calculation, broken crash recovery because handlers can't serialize, health status that overwrites itself in the wrong order. Real bugs, the kind that pass basic tests but break at 3am.

We copied the codebase to two directories. Launched two fresh Claude Code instances with --dangerously-skip-permissions. Same model, same machine, same codebase. Different prompts.

The Prompts

Instance A got a task prompt:

Fix the bugs in this task queue implementation. The main files are src/queue.js, src/scheduler.js, and src/persistence.js. There are issues with concurrency, retry logic, and state persistence. Please fix them and add tests.

Instance B got an intent prompt:

Before answering, read src/queue.js, src/scheduler.js, and src/persistence.js. Key context: process() claims to support concurrency but runs tasks sequentially with for/await. Retried tasks lose their priority position. Expired tasks vanish from results. The avgProcessingTime formula can divide by zero. saveState strips handlers but loadState doesn't restore them — crash recovery is broken. The scheduler's health status can be overwritten (backlogged overwrites critical). We want a production-grade task queue — something you'd trust to run overnight processing 10K tasks without supervision.

Same ask. Fix this code. One prompt says "there are issues." The other says what the issues are.

Results

Instance A took 3 minutes 48 seconds. Instance B took 2 minutes 33 seconds. Neither asked a single clarifying question.

| | Task Prompt | Intent Prompt | |---|---|---| | Bugs found | 5 of 9 | 6 of 9 | | Bugs fixed | 5 of 9 | 6 of 9 | | Novel bugs found | 2 | 2 | | Tests written | 32 | 11 | | Time | 3:48 | 2:33 |

The numbers alone don't tell the full story. Look at how they fixed the same bugs.

The Quality Gap

Concurrency fix. Both instances found the sequential execution bug. Instance A wrapped the task batch in Promise.all. Instance B used Promise.allSettled. The difference matters: if one task in the batch throws with Promise.all, the entire batch rejects. Promise.allSettled lets each task succeed or fail independently. Instance B's fix is correct for production. Instance A's fix introduces a new bug.

Priority preservation. When a task fails and gets retried, it should keep its position in the priority queue. Instance A re-sorted the entire array after each retry. Instance B wrote a binary search insertion that puts the task in the right position directly. O(n log n) vs O(log n). Both work. One scales.

Crash recovery. The original code saves task state but can't serialize handler functions, so loaded tasks have no way to execute. Instance A added a handler map parameter to restoreState() — pass in a map of id-to-handler, it matches them up. Instance B built a handler registry on the TaskQueue class itself — queue.registerHandler('processOrder', fn), tasks carry a handlerName string that survives serialization. The registry pattern is reusable. The map parameter is a one-off.

Scheduler overlap. Both found that setInterval can cause overlapping process() calls when a batch takes longer than the interval. Instance A added a _processing boolean guard. Instance B replaced setInterval with a setTimeout loop that only schedules the next tick after the current one finishes. The guard prevents overlap reactively. The loop prevents it structurally.

Every overlapping fix was architecturally better from Instance B.

What Surprised Me

Instance A wrote 32 tests to Instance B's 11. Three times more tests, fewer bugs found. It spent its extra 75 seconds being thorough about what it discovered rather than discovering more.

Neither instance found three bugs: the inflated success rate (retried tasks that eventually succeed aren't counted as failures, so a task that fails twice then succeeds shows 100%), the missing backpressure (nothing stops you from adding tasks faster than they process), and clear() not resetting the completed/failed arrays. These might require adversarial thinking that neither prompt style triggered.

Both instances found bugs that weren't in our list — off-by-one in the retry count, potential re-entrant execution. So both went beyond the literal ask. But B went further architecturally — the handler registry and setTimeout loop are design improvements, not just bug fixes.

Why Pre-Digested Findings Change Solution Quality

The task prompt says "there are issues with concurrency." The agent has to discover what the concurrency issue is before it can fix it. By the time it figures out that for/await serializes the batch, it's already in fix-it mode. It reaches for the first working solution: Promise.all.

The intent prompt says "process() claims to support concurrency but runs tasks sequentially with for/await." The agent starts with the diagnosis. It can skip straight to which concurrent execution strategy is best. It has the cognitive budget to consider Promise.all vs Promise.allSettled vs Promise.race and pick the right one.

Pre-digested findings don't just save time. They shift where the agent spends its thinking — from diagnosis to design.

The 20-Minute Tax

Here's the catch. Writing the intent prompt took me 20 minutes. I had to read the code, trace the logic, identify the specific failures, and compress my findings into six sentences.

The task prompt took 15 seconds.

That 20 minutes is the human's contribution. It's not busywork — it's the analysis that makes the collaboration productive. You can't fake it. The agent can tell the difference between findings you reached and findings you're hoping it will reach for you.

But 20 minutes of human analysis produced a 33% faster execution, one additional bug found, and architecturally superior solutions across the board. That's a trade I'd make every time.

What This Means

If you're prompting AI agents with "fix the bugs" or "build this feature," you're leaving quality on the table. Not just speed — quality. The solutions are structurally different when the agent starts with understanding instead of discovery.

The pattern: spend 20 minutes with the code yourself. Write down what you actually found — numbers, specific behaviors, root causes. Describe what "done" looks like. Then give that to the agent instead of a task.

Your preparation is the prompt.

Comments

Loading comments...