Spec-Driven AI Development: Why I Write a Spec First

I timed it. When I write a spec before opening Claude Code, an iteration takes around 40 minutes. When I skip straight to the prompt, it's closer to two and a half hours and three rollbacks.

The spec isn't ceremony. It's how I make an AI session predictable.

Why "write me a function" is an architect's prompt, not a developer's

When I say to Claude Code "write me an order export function," I sound like a developer. But I'm not acting like one.

I just made at least five decisions without making them explicitly: where the data comes from, what CSV format to use (delimiter, encoding, headers), how to handle empty fields, what error handling looks like, and where the output goes. I didn't consciously decide any of that. I let Claude decide for me.

Sometimes it guesses right. More often, it guesses in a direction I didn't want.

That's where the "rewrite it — again — one more time" cycle comes from. Not because Claude writes bad code. Because the task was underdefined from the start, and I hid that behind a seemingly clear request.

What I mean by spec: not a 10-page RFC

Spec-driven AI development means writing a short specification — answers to five specific questions — before opening an AI coding tool. Not a document with sections and approval stamps. A plain markdown file, or a few lines at the top of a session.

Takes 10–15 minutes. Sometimes less.

Writing the spec itself used to catch 30–40% of mistakes before I opened Claude Code. I'd start answering "what's the function's contract" and realize I didn't know. That meant the task wasn't ready to implement. Without a spec, I'd find out after the second or third rollback.

That's the real value: not controlling the AI, but getting early feedback on whether I actually know what I'm building.

The format: five questions before opening Claude Code

Not invented — developed over several months of working with autonomous cron tasks.

First question: what's in scope. One paragraph: what data comes in, what behavior comes out. If I can't write this in one paragraph, the task isn't ready yet.

Second: what's out of scope. An explicit list of what we're intentionally not doing. This is the anti-scope, and it matters more than the scope. Claude defaults to being helpful — without an explicit boundary, it'll add edge-case handling I didn't ask for and make the function three times more complex than it needs to be.

Third: what are the constraints. Stack, versions, existing conventions. In my Bitrix + Next.js projects this is critical: Claude knows React, but doesn't know how our component layer is wired. Without this, it writes correct React — just not our React.

Fourth: how do you verify it works. One or two concrete scenarios. Not a unit test, just: "given input X, I expect output Y." This forces me to think about behavior rather than implementation.

Fifth: what's already done nearby. Links to existing files or functions that solved a similar problem. The cheapest way to transfer context: instead of explaining architecture in text, I point to a working example.

Two examples from the same project: without spec and with spec

Last year I built an automation system for ivanpin.com — 13 autonomous cron tasks running Claude Code. Each task writes to a JSONL log, each has a stop condition, each runs without my involvement.

The first three tasks I wrote without a spec. Each needed an average of four iterations with Claude: the code worked, but didn't fit the system. Log structures were inconsistent. Error handling was inconsistent. Variable names were inconsistent. Technically correct code. Architecturally — noise.

Starting with the fourth task, I began writing a spec. Simple format: a markdown file with five sections, two or three sentences each. The last nine tasks averaged one iteration.

The difference wasn't prompt quality. I'd defined the task's contract before writing the first character of the prompt.

The second example, from the same project, went the other way. I was writing a spec for a publishing task. Midway through I realized the "out of scope" section had no explicit ban on auto-publishing to LinkedIn. I stopped, added it, and that likely prevented an awkward situation. Without the spec, I'd never have said that out loud.

When a spec is overkill — and how to know in two minutes

A spec isn't needed for one-off scripts that won't enter the system. If I'm asking Claude to write a quick migration script, a solid prompt with context is enough.

A spec is worth it when:

the result will be reused;
the code needs to integrate with existing architecture;
the task involves external data or external systems.

The check takes two minutes. I ask myself: "Does this code go into the project codebase, or is it a throwaway tool?" If the former, write a spec. If the latter, a good prompt with context is enough.

Takeaway

The spec isn't there to control the AI. It's early feedback for myself: I find some mistakes before opening Claude Code rather than after the third rollback.

The format is simple: five questions, 10–15 minutes, plain markdown. Code written from a spec fits the system on the first try more often than not.

I don't spec every prompt. I spec every task that goes to production.