Bitrix Dedicated Server: Production Metrics Before and After

A client message: "We followed all your recommendations. OPcache is on, Redis sessions are running. We even fixed the SQL queries. But the site still goes down during evening traffic spikes."

I pull up the metrics. OPcache hit rate: 97%. MySQL slow log: empty. PHP-FPM workers: all alive. But in the logs there are 2–4 second gaps between 18:30 and 21:00 — in places where there shouldn't be any delay at all.

I open the hosting dashboard. CPU steal: 28–42%.

There's the answer.

When code-level tuning stops helping

I've seen this pattern on several projects. A developer inherits a slow Bitrix site, works through the checklist — OPcache, session handler, query optimization, PHP-FPM pool size — and the site genuinely improves. Then a promo email goes out, or it's a Tuesday evening, and everything falls over again.

The easy blame is Bitrix. Or the thought: "time to go headless."

Sometimes the ceiling isn't in the code at all.

CPU steal: the metric your Bitrix monitoring never shows

CPU steal is the percentage of time a virtual machine spends waiting for processor time from the hypervisor. In a shared hosting environment, your PHP workers might be ready to run, but the hypervisor is busy with another VM on the same physical host.

Bitrix has no CPU steal dashboard. ISPmanager doesn't show it in its standard monitoring view. MySQL's slow query log doesn't capture it — the queries themselves might be fast. The only place you'll find it is in system metrics: vmstat, iostat -x, or your cloud provider's panel if they expose it.

CPU steal consistently above 10–15% is a problem. At 28–42% during peak hours, roughly every third clock cycle your server is waiting for a neighbor to release a physical core.

In our case — an e-commerce store, 28k SKU, Bitrix on a shared VPS — this was invisible during normal load. At 20–30 concurrent requests, everything was fine. At 80–100 requests during evening peaks, the waiting started.

The connection to code-level tuning isn't obvious until you see it. Check it yourself: run vmstat 1 10 and look at the st column under the cpu section. That's CPU steal, updated every second.

How shared CPU invalidates PHP-FPM and OPcache tuning

In a shared hosting environment, PHP process execution time includes not just actual computation but also CPU scheduling delay — the time a process spends waiting for a physical core to become available. OPcache eliminates file parsing overhead; PHP-FPM reduces process spawn overhead. Neither eliminates scheduling delay caused by CPU contention.

A PHP-FPM worker receives a request and starts executing PHP code. Without OPcache it parses the file every time. With OPcache it reads compiled bytecode from memory. That's fast — but only if the CPU scheduler gives the process time to run.

But "fast memory read" doesn't help when the hypervisor's CPU scheduler isn't giving your process any time. The worker isn't waiting on disk I/O. It isn't waiting on MySQL. It's waiting for CPU.

The numbers from our case:

PHP script execution time in isolation: 180ms
Same request under peak load (80+ concurrent): 1.8–3.2s
MySQL queries in slow log: 0 (all under 50ms)
OPcache hit rate: 97%
CPU steal at that moment: 35–40%

Code optimization has a ceiling. If the physical CPU is busy with other VMs, no amount of PHP tuning gets you past that ceiling. We'd reached it.

One more thing: we'd already fixed PHP-FPM pm.max_children from 5 to 38 (the details here). More workers didn't help under high CPU steal — each worker just spent longer waiting in the scheduler queue.

What the production graphs showed after migrating to dedicated

We moved to a dedicated server from the same cloud provider. Same RAM. Same PHP configuration. No code changes.

Results 72 hours after migration:

CPU steal: 35–40% at peak → 0–1% (dedicated cores, no neighbors)
TTFB p95 during evening peak: 3.1s → 0.85s
TTFB p50 during normal load: 0.41s → 0.38s (almost unchanged — daytime was fine before, too)
Timeouts and 504 errors at peak: 180–220 per hour → 0
PHP-FPM: workers stopped piling up in CPU-wait state

The p50 barely moved. That's expected. During the day, the shared CPU was coping. All the pain was in p95 under load — and that's exactly where the migration made the difference.

One thing that surprised me: after migrating, we found that some of our earlier OPcache tuning had been partially throttled by CPU steal on the old host. On dedicated, the same settings performed fully. Code-level work didn't go to waste — it was just blocked.

Three signs the bottleneck is the hosting, not your code

Before jumping to a dedicated server, check whether it's actually the infrastructure causing the problem.

First — non-linear degradation under load. If the site works at 20 requests per second and breaks at 80, but MySQL is fine during that time — look at CPU steal. Linear degradation usually points to memory or disk.

Second — PHP profiler time doesn't explain TTFB. If xhprof shows 200ms but the browser sees 2 seconds, and this only happens under load — you're likely seeing scheduling delay, not slow code.

Third — MySQL slow log is empty but users complain about slow pages. This is the clearest signal. The slow log catches slow queries, but it doesn't see the 800ms your PHP process spent waiting for CPU before it even started executing that query.

To confirm: vmstat 1 10. The st column. More than 10–15% consistently is a conversation worth having with your provider — or a reason to start pricing dedicated options.

When a dedicated server is actually worth it

A dedicated server gives a virtual machine exclusive access to a fixed number of physical CPU cores, eliminating CPU steal entirely. This matters for latency-sensitive PHP workloads where scheduling delays compound under concurrent load.

Moving to dedicated CPU isn't free. In our case it added about $60/month. It's worth it when:

You have clear peak load patterns (evenings, promo campaigns, newsletters)
The code is already tuned: PHP-FPM pool, OPcache, session handler, SQL
CPU steal is consistently above 15% during those peaks

It's not worth it when:

The slow log shows queries taking 2–5 seconds — that's the code, not the host
OPcache isn't configured or validate_timestamps=1 is still on in production
PHP-FPM pm.max_children is still at the ISPmanager default (5–10)

The sequence matters: close the basic PHP optimizations first. If peaks still kill the site after that — check CPU steal. A dedicated server isn't the first step. It's what you do when everything else is already done.

For our project, everything lined up: the code was right, the infrastructure wasn't. Three months of PHP tuning, and we'd hit a ceiling that had nothing to do with PHP.

CPU steal is now one of the first things I check when a client says "we did everything right, but it's still slow." Sometimes the answer is in the code. Sometimes a physical core is busy with someone else's traffic.