Autonomous AI agent task specs: a format that works at 3am

When I write a task for an agent that runs on a cron at 3am, I think about the wording differently.

"Check the inbox and generate blog topics" isn't a task. It's the start of a conversation. The agent hits ambiguity immediately: what does "check" mean, how many topics, what happens if there are 200 files, how do conflicting sources get resolved? In interactive mode, I'd answer. At 3am, I'm not there.

Over the past year and a half, I've written 13 such tasks and put them in production. Each one has to work without me in the loop — no clarifying questions, no back-and-forth. After several iterations, I landed on a format that stopped surprising me.

A task spec vs a prompt: why they're not the same thing

A task spec for an autonomous AI agent is a structured document that defines what the agent should do, how it handles edge cases, what it's allowed to touch, and when it should stop — without the option of asking you for clarification. A prompt is a conversation starter. A task spec is a contract.

A prompt expects a conversation. You write something, the model responds, you refine. That's fine — it's like chatting with a smart colleague who knows the codebase.

A task spec is a different genre. It's closer to a deploy script than a chat. A deploy script doesn't ask "what if the directory already exists?" — it either handles the case explicitly or fails with a clear error. A well-written task spec works the same way.

That distinction changes how you write it. Less "here's the context," more "if X then do Y, if Z then exit with code 0."

Five parts of a spec that works unattended

Every task I write now has five mandatory sections.

First — environment setup: absolute paths, working directory, first bash command. Sounds obvious, but an agent launched via launchd doesn't start from project root. Skip the explicit cd and the first command goes somewhere unexpected.

Second — goal with explicit anti-scope. What the task does and what it doesn't do, as a list. Without the anti-scope, the agent expands the task at its discretion. That's not malicious — the model is just trying to be helpful. But I didn't ask for that.

Third — deterministic steps. Step 1, Step 2, Step 3 with explicit expected outputs. Not "process the files," but "read file X, extract field Y, write to Z." The more specific, the less room for interpretation.

Fourth — exit conditions. This is the part I skipped at first. I paid for it.

Fifth — permissions. What the agent can do vs what it must not. More on that below.

Exit conditions: the part people skip

An agent without stop rules is an uncontrolled process. Fine for trivial cases. Risky when the execution context is unusual.

The first time I ran a topic-generation task without a file count limit, it walked through the entire archive — 73 files instead of the five new ones I expected. The task technically completed. I wasn't happy with the result.

Now a typical exit condition looks like: if no pending slots — log no_pending_slots, exit 0. Or: if inbox contains more than 50 files — skip this run, log a warning. Simple conditions, but they save time on debugging.

Don't rely on the model's "common sense." A language model doesn't have a sense of limit. It keeps going until the task is framed as complete. Framing it as complete is your job.

The permissions contract

In every task I now write an explicit section: what the agent can do, and what it must not.

Can do: read files, write files to the working directory, run bash commands from a whitelist, make HTTP requests to specific endpoints.

Must not: run git push, delete production data, send messages to channels not listed in the task, modify config files without explicit permission.

The first time I didn't include an explicit ban on git commit, the agent made a commit — because the task involved editing files, and committing seemed like the logical next step. From the agent's perspective. I didn't ask for it.

The principle of least privilege applies to AI agents the same way it applies to service accounts. Give exactly as much access as the task needs.

Logging as part of the contract

An agent that doesn't leave a trail is impossible to debug.

Every task I run writes to a JSONL file: timestamp, task name, status, key identifiers of processed objects. One file per task, one entry per run. If a run failed, the log has an entry with status: failed and the step where it stopped.

When a task fails at 3am, I open the log in the morning and have the full picture in 30 seconds. Without a log, it's detective work: git history, bash history, file timestamps.

I wrote more about how I structure logging and monitoring for autonomous agents in this piece on production AI agent monitoring.

Iterating the spec after the first run

A good spec rarely works perfectly the first time. Not because I'm bad at thinking ahead — real data is always richer than a thought experiment.

After the first run, I read the log and find one of three things: an edge case I didn't account for ("what if the file already exists?"), a permission that turned out to be needed ("task tried to write to a directory that doesn't exist"), or a weak exit condition — the task ran to completion when it should have stopped earlier.

Then I revise the spec. Sometimes once. Sometimes three or four iterations until the task stops surprising me.

I treat task specs like code: they live in git, they change via commits, and I have a history of what changed and when. That discipline pays off when something breaks — which it will, eventually.

If you're curious about the broader question of when to use a cron job versus an AI agent at all, I covered that earlier: cron vs AI agents — boring, reliable, production-proven.

What changes in practice

An autonomous agent isn't a smarter cron. It's a tool that interprets the task. Interpretation means ambiguity if the task is written loosely.

When I shifted from "write a good prompt" to "write a deterministic contract," the number of surprises in production dropped. Not to zero — the model can still make an unexpected call in an edge case. But edge cases became rarer, because most of them are now handled explicitly in the spec.

A task spec is a document that works in your place. Write it like documentation for a critical service. Because that's what it is — just without a user interface.