I let Claude write tests. I don't let it choose architecture.
Claude wrote 340 lines of tests for me in eight minutes. Good tests. I'd have spent two hours on them. But when I asked it to suggest a module structure for a new Bitrix project, it produced something symmetric, logical, and completely wrong for our context. That gap became my working definition: where Claude's job ends and mine begins.
This isn't about whether to use AI. That question's settled. It's about where I drew the line — and why.
Where AI actually saves time
Claude handles unit tests, boilerplate generation, and rule-based refactoring faster than a human engineer — typically 4-10x on routine tasks. There's a class of work I delegate without hesitation.
Tests. If a function takes defined input and returns a defined result, Claude covers it with unit tests faster than I can open the right file. Not because it's "smarter," but because it doesn't have to context-switch. I give it the function signature, the expected behavior, a couple of edge cases out loud — it returns a set of PHPUnit tests I just need to review.
Boilerplate. A REST endpoint from a given schema, a migration file for a new table, webpack config with specific options. Anything where the task is unambiguous and the result is mechanically verifiable.
Rule-based refactoring. "Rename all $res variables to $result in this file." "Extract this logic into a separate method." "Replace deprecated D7 API with the current one." Here Claude has a clear rule and a bounded scope. It doesn't decide whether refactoring is needed. It executes a specific transformation.
All of these are tasks where the rule already exists and I just don't want to type it out.
Where AI produces confident nonsense
Claude generates unreliable output when the correct answer depends on project-specific context it doesn't have: system history, business constraints, team decisions made years ago. I've seen this pattern repeat regardless of how the request is phrased.
Module structure proposals. Claude builds beautiful symmetric trees. The problem is that "beautiful symmetry" doesn't account for how load is actually distributed, where coupling has historically built up, which components are better left untouched. It hasn't seen your project for the last three years.
Caching decisions. "Add a Redis cache to this query" is the standard recommendation. But if there's already a caching layer in Bitrix for that query, adding another one gives you double-invalidation bugs that reproduce once a week. Claude doesn't know about the first layer — it hasn't seen all your code.
Choice-between-two-approaches advice. "ORM or raw SQL?" Claude's answer will be technically sound. It won't account for your 40,000 lines of existing raw SQL, that nobody on the team has used an ORM, that the deadline is next week. Its answer exists in a vacuum.
Bad output appears wherever the right answer depends on context Claude doesn't have.
Why architecture can't be delegated to a model
Software architecture decisions encode business constraints, past failures, and team-specific tradeoffs that aren't visible in the current codebase. A language model has access to none of that. An architectural decision carries history. Why there's a monolith here instead of services. Why this component is written this way. What came before, what broke, what had to change and why exactly like this.
Claude hasn't read your old pull requests. It doesn't know you tried a microservices approach three years ago and rolled back because of specific team issues. Doesn't know the business constraint that says "don't touch this until we renew the partner contract." And the compromises already made — it doesn't know those either.
When Claude proposes architecture, it's proposing a solution for an abstract project. Your project isn't abstract.
There's also inertia. Pick the wrong structure at the start, and you live with the consequences for a year and a half. A bug in a test costs one commit to fix. A bug in module architecture costs a month of refactoring — or permanent technical debt.
My working protocol
The test I use before every Claude task: "Could I give this same task to a capable junior who's never seen our project?" If yes, Claude handles it. If no, I do it myself.
Go to Claude for:
- Writing or extending unit tests for a function with known behavior.
- Generating boilerplate from a template (endpoint, migration, config).
- Applying mechanical refactoring with a specific rule.
- Finding and explaining a specific error in code I've shown it in full.
- Writing a SQL query against a specific schema with specific conditions.
Don't go to Claude for:
- Designing the structure of a new module or service.
- Deciding whether refactoring is needed at all.
- Choosing between two approaches with non-obvious tradeoffs.
- Assessing how a change in one part of the system affects others.
- Deciding whether to add a new layer (cache, queue, index).
Practically: before sending Claude a task, I ask myself one question. "Could I give this same task to a solid junior who's never seen our project?" If yes — go to Claude. If no — do it myself.
What this means for senior engineers in 2026
Less than the headlines suggest.
I spend less time writing code that executes an already-made decision. More time making sure the decision is right: understanding context, checking that Claude's output actually solves the problem rather than quietly creating a new one.
A senior engineer in 2026 is someone who can frame a task precisely, review the result, and knows where the tool works and where it doesn't. Same role as always. There are just more tools now.
The red line
Responsibility for architectural decisions — choice of approach, system-wide consequences — stays with the engineer, not the AI tool that helped write the code. The most dangerous thing I've seen in the past year and a half isn't bad code from Claude. It's when "Claude wrote it" becomes the answer to "who's responsible."
Responsibility for architectural decisions, for the choice of approach, for the system's consequences stays with the engineer. Always. Claude is a tool. Its output needs review — you don't ship it to production without one.
If your team has developed a "AI wrote it, must be fine" reflex, that's a symptom of missing engineering culture, not a productivity win.
If the rule already exists — use Claude. If the rule still needs to be defined, that's the engineer's job, and there's nowhere to offload it.
I wrote about what a good daily working context for Claude actually looks like: Claude Code and Obsidian as living documentation.