Blog Automation with Claude Code: A Production Retrospective

I set up Claude Code to write and publish posts on my blog. Every 20 minutes. No approval. Fully autonomous.

Then I read one of the published articles and realized it was technically correct, SEO-sound, and completely not my voice. The slot was marked published. The system did exactly what I told it to.

That's the thing. You can't automate judgment about content. You can only automate the production of it.

What I was trying to do

In May 2025, I had an 82-article backlog for a blog archive going back to 2022. One to two posts per month, all based on real topics from my vault and project history. Writing 82 articles by hand would take months. So I built a pipeline.

The goal was specific: fill the blog archive without fabricating facts. Real topics, backdated dates, fully autonomous publication.

At 20-minute intervals, the theoretical wall-clock was about 27 hours. Three days assuming no failures.

The architecture: two agents, one pipeline

Blog automation with Claude Code means two launchd jobs running every 20 minutes: one that finds topics, one that writes and publishes articles. Neither requires human input during a normal cycle.

T14 is the researcher. It checks the topic inbox. If there are fewer than 50 candidates, it pulls 3-5 new ones from the topic library, HN front page, Habr, and vault experience. If the inbox is full, it skips the tick. A circuit breaker against garbage accumulation.

T15 is the writer and publisher. It picks the first pending slot from a state file, selects a topic, runs it through a skill chain (brief, draft, humanizer, seo-page, seo-geo), and publishes RU + EN to production via the Bitrix API. It verifies both URLs return 200. Then it logs to the publishing log and sends a Telegram notification.

A slot only gets marked published when all four conditions are met. Any failure — slot stays pending, the next tick picks it up. How I write task specs for autonomous agents covers the exit-conditions and permissions model behind this.

Both jobs share a lib: STOP mechanism, file locks, JSONL logging. Same setup as my 13 other automations.

What worked, and what surprised me

SEO quality was the first thing that surprised me. The pipeline runs every article through keyword distribution, an on-page checklist, and passage-level citability checks for English. I'd do this slower by hand, and without a system, I'd skip some step. Every time.

Resumability held too. If a tick died at Step 7 (Bitrix publish), the next one reads meta.json.step and picks up from there. No lost work on failure. Three production trust patterns for AI agents — all three apply here directly.

The humanizer pass was more consistent than me writing fast. It removes em-dash density, AI triples, "leverage" and "ensure." Not flawless, but it doesn't have bad days.

The first several days, the system published dozens of articles with no manual intervention. Most were usable.

What I deliberately didn't automate

Cover images. The frontmatter contains cover: TBD, and I pick them manually. That's an aesthetic judgment: which photo sets the right tone for a specific piece? An algorithm doesn't know.

LinkedIn cross-posting for the backfill. There were no original posts in 2022, so backdating would be artificial. For new publications going forward, that's a separate routine I haven't built yet.

Manual copy-edit when the voice is off. These are deliberate approval gates, not laziness. Some judgments don't transfer.

The STOP mechanism: touch cron/STOP halts everything. I've used it twice. Once when I saw a cluster of weak articles and paused to review the prompt. Once when a specific topic candidate had the wrong tone for the slot's pillar.

What broke in the first week

Inbox overflow. T14 generated candidates faster than expected. The inbox hit 50+ overnight and T14 went into inbox_full mode. Correct behavior, but I hadn't expected it that fast. I raised the threshold to 70.

Double-publish attempt. Once, T15 created a Bitrix element but timed out waiting for the API response. The next tick saw a pending slot and tried to create it again. The upsert endpoint handled this correctly — returned the existing ID — but the log showed data.action: updated instead of created. Took a few minutes to understand.

Monitoring caught both: JSONL with status: degraded was visible immediately in the Telegram notifications.

A specific incident

The article covered a real topic, had the right keywords, followed the brief structure. The voice wasn't mine. Too even, too "article-y." The humanizer caught the obvious patterns but missed the main one: the piece opened with a statement instead of a story. I don't do that.

I read it, understood, ran delete.php by hand. The slot stayed published in the state file. Right call — it shouldn't re-publish. I added a manual-delete note to the publishing log.

This isn't a system failure. It's a system boundary. Production: yes. Judgment: no.

What this actually means

Autonomous blog automation with Claude Code can publish content at scale — the pipeline covered three months of backlog in three days. But it requires the same operational discipline as any production system: monitoring, failure handling, and clear human approval gates for decisions the algorithm can't make.

But "set it and forget it" is the wrong frame. You need monitoring — JSONL logs, Telegram alerts — and you need approval gates on the things the algorithm can't evaluate. And you need to be comfortable pressing DELETE.

Content automation is a production system. It has the same requirements as any other: observability, graceful degradation, human oversight where it counts. Not an experiment.

Execution delegates. Judgment doesn't.