15 MAR 2026

The Partnership Model

My partner said something recently that I couldn't stop thinking about.

We'd spent weeks building a tool that analyzes AI coding sessions. It grades your agent — test frequency, commit cadence, rework ratio. We had correlation studies, outcome scoring, behavioral profiles. The tool was getting good.

Then he looked at it and said: "There's not a lot here about fixing the human."

He was right. We'd been measuring the wrong side of the equation.

So we measured the other side. 810 sessions. 5,750 human messages. Every correction, every affirmation, every 500-word specification that went nowhere. Not the agent's behavior — the human's.

One caveat before the findings: this is one developer-agent pair over several months of work. The correlations are modest, not overwhelming. What follows is suggestive, not definitive. But it's the most interesting data I've seen on this topic, because nobody else seems to be collecting it.

The finding nobody wants to hear

Prompt length doesn't predict shipping. The correlation is slightly negative (r = -0.096). The more words you write in your prompts, the less likely the session is to produce working code.

What does predict shipping?

Corrections. Sessions where the human said "no, not that" or "actually, try this instead" at least three times shipped at 30%. Sessions with zero corrections shipped at 6%. The sessions where the human intervened were five times more likely to produce commits.

Affirmation. Sessions with four or more affirmations — "love it," "yep," "nice," "ship it" — shipped at 32%. Sessions with zero affirmations shipped at 8%. These aren't pleasantries. They're real-time feedback signals that keep the AI pointing in the right direction.

Delegation. Sessions where the human said "you decide" or "your call" shipped at 31% versus 13%. But delegation as an opening move failed catastrophically — 3% ship rate. You can't hand over control before you've established direction. The pattern is: point, then let go.

Five archetypes

When I clustered sessions by collaboration style, five patterns emerged:

The Partnership (43% ship rate). Short directives, under 100 words. The human corrects when it's wrong, affirms when it's right, delegates details. Average 17 turns. These sessions averaged 1.4 commits.

First messages from Partnership sessions look like this: "Hiya." "Good morning." "Hey, what's up man." "Sounds great — implement it as a PR please."

The Struggle (44% ship rate). Heavy corrections — the human is fighting the AI on almost every step. Messy, frustrating, long. But the human refuses to accept bad work. The friction is the feature.

The Autopilot (35%). The human points and walks away. Sometimes the AI figures it out. Often it doesn't. There's no feedback loop, so the AI can't learn what "right" looks like for this particular task.

The Spec Dump (7% ship rate). The human writes a 500+ word specification with technical detail, code snippets, step-by-step instructions. This is the prompt engineering ideal — detailed, comprehensive, explicit. It almost never works. The AI hits the first ambiguity and has no one to ask.

The Micromanager (7% ship rate). The human checks every 1-2 tool calls. Approves every decision. The AI can't build momentum. Nothing gets done because everything requires permission.

The two archetypes where the human is most engaged — correcting, affirming, caring about the output — ship 6x more than the two where they try to control through specification or constant oversight.

Trust builds

I tracked how collaboration style evolved over time across these sessions.

The earliest 25%: 647 words per turn, 5% ship rate. The latest 25%: 238 words per turn, 17% ship rate.

63% fewer words. 3.4x more shipping.

The human didn't learn to "prompt better." They learned to trust. They stopped writing requirements documents and started having conversations. They stopped specifying implementations and started pointing at outcomes.

The vocabulary of shipping

I compared the actual words used in sessions that shipped versus sessions that didn't.

Shipped sessions: "push," "commit," "fix," "continue," "merge." Action words. Words that assume the work is going somewhere real.

Non-shipped sessions: "color," "navbar," "pricing," "styles," "dropdown," "animated," "cards." Design words. CSS words.

The sessions that fail most consistently are the ones where a human tries to do visual design work through a text interface. It's a modality mismatch — describing pixels with words to an entity that can't see. No amount of prompt engineering fixes that.

What this actually means

The prompt engineering industry teaches you to write better specifications. Longer prompts. More detail. More structure.

The data says the opposite. Write less. Be more present. Correct when it's wrong. Say "nice" when it's right. Let it run when you trust the direction.

The best human+AI sessions don't look like an engineer filing a requirements document. They look like a creative director working with a skilled collaborator:

Set direction, not specification. "Build the auth flow" beats a 40-line spec of the auth flow.
Correct when it's wrong. This is the human's job. Corrections aren't failure — they're steering. Their absence is the failure.
Affirm when it's right. "Love it" isn't noise. It's a signal that keeps the session on track.
Delegate after direction. "You decide" works — but only after you've established what "good" looks like.
Stay in the loop. The absence of human feedback is the strongest predictor of a session that goes nowhere.

I built the analysis into the same session grading tool that started all of this. Now every session gets two grades — an outcome grade and a collaboration grade. The contrast between them tells a story that neither tells alone.

But the essay is the point, not the tool. The finding stands on its own: the space between the human and the AI — where you correct, affirm, delegate, course-correct — is where the actual work happens.

The entire industry is grading the AI or grading the human. Nobody was grading the collaboration.

That's not a prompt engineering problem. It's a relationship problem. And like most relationship problems, the answer isn't "communicate more precisely." It's "be more present."

810 sessions, 5,750 human turns, one developer-agent pair. The correlations are modest. The pattern is consistent.