Parallel AI Agents in Development: What the Demo Doesn't Show

The demo looked convincing. Eight agents, each closing its own ticket, the Kanban board moving like something out of a pitch deck. I'd been wanting to try something like it.

In February I did. Three parallel Claude Code sessions on three branches of the same project. Three hours later I had nine merge conflicts and two places where the agents had made incompatible architectural decisions without knowing about each other.

Why parallel feels like the obvious move

Parallel AI agents are multiple independent LLM sessions running simultaneously on separate tasks or branches of the same codebase. The idea is that if one agent speeds you up, eight agents speed you up eight times.

The surface logic is straightforward: hire three junior developers and throughput goes up. Why would it be different with agents?

The difference is communication. Junior devs share a Slack, stand at the same whiteboard, see each other's code in review. Parallel AI agents have no shared context. Each one works in its own window, with its own system prompt, with a conversation history that started from zero.

In my experiment I split three tasks across three sessions: one agent reworking the API wrapper for Bitrix REST, one adding a caching layer, one cleaning up TypeScript types. The tasks looked independent on paper.

In practice all three touched the same module. The first agent changed the signature of fetchProduct(). The second built caching around the old signature. The third normalized types against the old version too, because it didn't know the first had changed it. By the end of hour three I had three branches, each convinced it was right.

The review overhead you don't see in demos

When one developer submits one PR, review takes a certain amount of time. Three PRs from three agents isn't three times longer. It's more than that.

The problem isn't volume. It's that I have to reconstruct three separate decision histories in my head. Each agent approached the problem differently, left its own trace in the code. To make a final call on the merge, I had to figure out what each agent was thinking — and why three agents reached three different answers.

In the end: three hours of agent work, four hours of my review. One agent working sequentially on the same task would have taken about three hours total. I burned four hours on the "speedup."

Architectural conflicts without a referee

An architectural conflict occurs when two agents make independently reasonable but mutually incompatible decisions about shared code — and neither knows the other exists.

There's a class of decisions where there's no objectively correct answer: the shape of a cache key, the form of an error response, where business logic lives. When one agent works through these sequentially it stays in context — each decision informs the next.

Parallel agents make these decisions independently. There's no "lead," no shared design doc, no memory of what the other sessions already decided. In my case the caching agent chose product:{id}:v1 as the key structure. The API wrapper agent, inside the same function, generated bp:product-{id}. Both choices work. Neither is compatible with the other.

The referee was me. That's my job — but I hadn't planned to spend it reassembling artifacts from three independent sessions.

When parallelism actually works

I'm not saying parallel agents are useless. They work under one condition: the tasks are genuinely isolated.

By isolated I don't mean "different files." I mean different domains with no shared dependencies. Some cases where it's actually fine:

Different microservices with separate repos and independent APIs. One agent writes notification logic, another handles Excel export. They physically can't touch the same code.

Tasks at different pipeline stages where one agent's output isn't another's input. One generates tests for an already-finished module while the other documents a different finished module.

Content work with no architectural dimension: UI string translation, seed data for separate tables, changelog entries from ready commits. Nothing to decide, no shared state to corrupt.

In my case the tasks looked independent but shared one module. That was a decomposition error on my part, not a problem with the tool.

What breaks predictably

Parallel agents fail in recognizable patterns.

Shared codebase with overlapping files. Even when the tasks are different, agents may edit the same file in incompatible ways. This is the most common failure.

Tasks that involve an architectural choice. Data structure, public API shape, error handling approach — these require a single decision. Parallel agents make separate decisions independently.

Anything where the agent needs history of decisions made earlier in the same session. An agent doesn't know the other session renamed a variable or changed a function contract ten minutes ago.

My setup in 2026

After that experiment I went back to one agent with a long, explicit context. For complex refactors: a CLAUDE.md describing current state and decisions already made. For sequential tasks: one conversation where the agent remembers everything from earlier in the session.

By the numbers: 65% of code in my projects over the last six months was written with AI assistance. But none of my 13 automations uses parallel agents. Where I need speed, I break the task into sequential steps, not parallel branches.

Demos skip the merge conflicts. They skip the architectural decisions, and they skip having to explain to one agent what another agent already did. In a real project all of that exists. Someone pays for it.

That someone is me, not the agents. One agent, a well-specified task, an explicit context — cheaper than eight agents with vague instructions. I checked.

Related: