Claude Code in PHP Development: 1-Year Honest Review

A year ago I wrote about what I don't delegate to AI. The list: architecture, cross-layer contracts, database schema decisions. That list still holds. But the year added something I didn't expect — a clearer picture of where Claude Code is genuinely useful.

It's not where you'd think.

The most valuable thing Claude Code does isn't write code. It's expose tasks where I don't have a clear enough spec to delegate to anyone.

What I handed off and don't regret

Let's start with what actually works.

Tests. Claude Code writes unit tests faster than I do, and I don't resent them. Not because it's smarter — because it doesn't get bored. Over the year, several hundred PHP and TypeScript tests passed through it. Roughly 15% needed logic corrections. That's a trade I'll take every time.

Boilerplate. A new Bitrix D7 module — include.php registration, EventManager wiring, a basic REST handler — takes me 40 minutes of copy-paste tedium. Claude Code does it in 3 minutes. I read the output, fix two things, move on. That's a real win.

Refactoring with a defined contract. If I have a class with a clear input/output and I want to split it in two, Claude Code handles it. The condition: I describe what can't break, what tests exist, and what "correct" looks like. Without that description, results are unpredictable.

Also: logs. Claude Code reads 200 lines of Bitrix or Laravel error logs and surfaces three error patterns with a probable root cause in 30 seconds. Not a diagnosis — a first cut. I spend 5 minutes instead of 30.

What I tried to hand off and took back

Now the honest part.

I tried delegating database migrations to Claude Code on a Bitrix project with 28,000 SKUs. The migration looked reasonable. I approved the PR. Three days later I found it had added an index on a field used in a compound query with ORDER BY — and MySQL started ignoring the existing composite index. TTFB on catalog pages jumped from 480ms to 1.1s. Diagnosis took two hours. Migrations don't go to Claude Code now without an explicit EXPLAIN in the task spec.

I also tried letting it modify PHP functions tied to Bitrix composite cache. The generated code worked fine in dev — and silently broke caching in production. Composite cache is a non-local dependency: a module's behavior depends on component configuration at the site level. Claude Code doesn't know that. I didn't write it in the task. Whose fault? Mine.

Third: I asked Claude Code to write a stock sync script between 1C and Elasticsearch. The script ran. But it didn't account for our three warehouses with different availability logic — "in stock" for warehouse A meant "unavailable" for regional orders. Not an AI failure. A task-without-domain-context failure.

One pattern covers all three: if a task requires knowledge of non-local system state that isn't written down explicitly, Claude Code will produce something plausible and wrong.

The boundary: writing code vs making decisions

The useful question isn't "what can AI do?" It's "what have I formalized well enough to delegate?"

If I can write a task spec that a good junior developer could execute without calling me — Claude Code will do it. If I can't — then Claude Code will either do it wrong, or (better case) ask a question that reveals I didn't know what I wanted.

That second outcome is the most valuable mode. When Claude Code asks "what should happen to the order if the warehouse is deleted?" — that's not an AI limitation. That's me catching a spec gap before code, not after.

How context is organized

CLAUDE.md is a project-level rules file that tells Claude Code what it can and can't do in a specific codebase — which zones to avoid, which tests to run, where non-local dependencies live. Three types of context files run our projects.

CLAUDE.md in the project root holds the persistent rules: what not to touch, which tests matter, which code zones have non-local dependencies. Currently 14 lines. Each line appeared after a specific incident.

TASK.md for anything over one hour of work — a mini-spec: what we're doing, what we're not doing, what test shows "done." Without this I don't let Claude Code run autonomously.

Hooks in .claude/settings.json — automatic checks after code generation: linter, types, specific tests. If a hook fails, Claude Code doesn't continue. This has caught several silent regressions.

In six months on this setup, I haven't had a serious production incident from AI-generated code. Not because Claude Code got better. Because I got better at writing tasks.

Three things I still can't get

Honest gaps.

First — Claude Code has no long-term project memory. Every new session starts from scratch. CLAUDE.md helps, but it's manual maintenance. I want it to remember: "last time this approach didn't survive production."

Second — it's weak on Bitrix-specific patterns: compound events, ORM D7, composite cache. These aren't documented well enough anywhere to have trained the model usefully. Any task touching Bitrix internals still needs a detailed context dump from me, every time.

Third — there's no way to measure whether AI assistance is net positive on a given task type. I know roughly 60% of AI-generated code ships without serious edits. But "roughly" isn't a metric. I can't tell where to invest in better specs versus where AI is just costing me review time.

What I actually got

A year ago I thought Claude Code would speed up code writing. It does. But the shift that mattered more: it changed what I think about when I sit down to work.

I think less about code. I think more about contract — what the task actually means, what the system state is, what "done" looks like in production conditions.

That's worth more than faster code generation. And it's why I'm still using it.

*Internal links: non-vibecoder-ai-workflow-php-studio, ai-delegation-framework-what-not-to-give-ai, ai-written-bug-one-week-in-prod*