PHP production bugs in Bitrix: 4 language-level gotchas

PHP has four language-level behaviors that cause production performance problems in Bitrix: session file locking, OPcache stale entries on deploy, array copy-on-write memory doubling, and non-deterministic destructor order. These aren't bugs — they're documented behavior — but they only become visible under production load.

We spent three days chasing why catalog pages were slow under concurrent requests. The profiler showed time disappearing into waiting. Nginx was clean. MySQL was idle. FPM workers were available. Turned out session_start() was putting a file lock on the entire session — so all parallel requests from the same user were queuing up. PHP was doing exactly what the documentation says. That page of documentation just isn't one anyone reads before going to production.

This is a different category of problem from misconfigured opcache.memory_consumption or wrong pm.max_children. Infrastructure-level Bitrix issues show up in your monitoring. PHP language-level bugs in Bitrix production don't. They appear under load, look like anything else, and don't go away with a PHP-FPM restart.

Here are four behaviors I've run into across multiple Bitrix projects.

Session locking: PHP serializes requests for the same user

Session locking is a PHP behavior where session_start() acquires an exclusive file lock on the session file, blocking all other requests with the same session ID until the lock is released. By default PHP stores sessions in files. When a request calls session_start(), PHP takes an exclusive flock on that session file. Every other request from the same user — same session ID — waits until the lock is released.

In a normal app it's invisible. Requests are sequential, users don't open four tabs simultaneously. In a Bitrix catalog under load, it's different. Product page, Ajax stock check, facet filter — three parallel requests, one session ID. The first one grabs the lock. The other two wait.

For us this showed up as TTFB spikes on a subset of requests: 600–1400ms instead of the usual 180ms. Running strace on FPM workers showed them alive but blocked on flock(). Not MySQL, not OPcache — a session file.

One approach: Redis sessions. I covered the full switch in a separate post — it eliminates the locking and the failure risk at peak load. You need Redis for it.

The simpler fix: session_write_close() right after you've read the session data. If the request doesn't write to the session, there's no reason to hold the lock. Add it after the auth check and the lock releases immediately. No Redis required.

What we measured: after adding session_write_close() to read-only catalog routes, average wait time for parallel Ajax requests dropped from 820ms to 40ms on the same hardware.

OPcache on deploy: stale opcodes stay longer than you'd expect

OPcache compiles PHP files into bytecode and caches them in shared memory. When a file changes on disk, OPcache doesn't know about it immediately — it checks timestamps on a schedule (opcache.revalidate_freq) or not at all if opcache.validate_timestamps = 0.

The basic configuration side is covered in a separate post. The production issue worth understanding separately is the race condition during deploy.

Bitrix deployed via FTP means updating files one at a time. While half the files are updated and half aren't, OPcache holds compiled versions of both in memory. A request that hits a worker with partially stale cache gets mixed state: new classes and old methods in the same execution.

Symptoms: intermittent PHP Fatal errors right after a deploy, errors that don't reproduce consistently and clear up on their own after a few minutes. Developers usually blame "something with Bitrix cache" — it's OPcache.

The right approach for production Bitrix: atomic directory swap (symlink approach) plus opcache_reset() immediately after switching. Or opcache.validate_timestamps = 1 with revalidate_freq = 2 during the deploy window, switched back to 0 after.

If you're deploying via FTP without atomicity, at minimum add an opcache_reset() call (HTTP or CLI) at the end of your deploy script. Without it, the race condition window lasts until revalidate_freq seconds have passed.

Copy-on-write with large arrays: memory doubles when you don't expect it

PHP uses copy-on-write: passing an array into a function by value is cheap — the copy is only created on first write. But when the write happens, it's a full copy. For a large array, that's a sudden doubling of memory usage.

In a Bitrix catalog with 28K SKUs, this came up in a product handler. A large associative array ($items) was passed into several nested functions for formatting, price substitution, and facet rendering. Every function read the array without modifying it — until we added a normalization step. After that, one function started modifying $items, and PHP created a full copy on every catalog request.

A ~14MB array turning into ~28MB peak usage per request. With 30 concurrent requests, that's +420MB in peak memory. FPM started killing workers.

Diagnosis: add memory_get_peak_usage() before and after the suspicious function. If the difference exceeds memory_get_usage() before the call, the function created a copy.

Pass by reference (&$items) for functions that modify. Keep write operations isolated — read-only formatters stay cheap.

For our Bitrix catalog we introduced a simple rule: formatting functions always work with the passed value (cheap as long as they don't write), mutation functions explicitly take and return $items.

Destructors and register_shutdown_function: the order that surprised us

PHP calls object __destruct() methods and functions registered via register_shutdown_function() when a script finishes. The order isn't deterministic: destructors fire in reverse creation order, shutdown functions fire in registration order, and how they interleave with each other changed between PHP versions.

This bit us when we moved a project from PHP 7.4 to 8.1. We had a logger class with a __destruct() that wrote a final entry to a file. And a shutdown function that closed the database connection. On PHP 7.4, the logger destructor ran before the connection was closed. On PHP 8.1, in some conditions the shutdown function closed the connection first — and the logger died with PDO: connection already closed.

The problem reproduced inconsistently, only under load (when PHP was terminating scripts via memory limit). Never in dev.

The rule we put in place: don't rely on teardown order at script exit. Anything critical — log flushes, final state writes — should happen in an explicit place, before exiting index.php, not in a destructor.

How to find these

Infrastructure metrics won't show you language-level issues directly, or they'll show the symptom while hiding the cause.

strace -p <pid> -e trace=file,futex on an FPM worker under load. Session locking shows up immediately as a series of flock() calls with long waits.

opcache_get_status() after a deploy — compare hits vs misses per file. A sharp drop in hits means the cache was reset or didn't pick up the new files.

memory_get_peak_usage(true) in Bitrix debug mode on specific routes. A large difference between routes that should be similar is the first sign of array copy-on-write.

set_exception_handler() + set_error_handler() with full stack traces logged. In production, destructor failures often die silently.

Four questions before production deploy

Before any Bitrix production deploy I check:

Is Redis or session_write_close() in place for read-only routes?
Does the deploy script call opcache_reset() after updating files?
Is memory_get_peak_usage() logged on catalog routes?
Are any critical operations in destructors that depend on open connections?

Ten minutes before the deploy. Saves three days after.

All four behaviors are in the PHP manual. These aren't bugs — it's the language working exactly as documented. Production Bitrix under load just makes the documentation visible.