Back to blog

How MySQL Slow Query Log Found the Bottleneck We'd Been Chasing for Three Months

For three months we knew something was slow. LCP on the product catalog: 4.1 seconds. Where exactly — no idea.

We ran a PHP profiler. It showed CIBlockElement::GetList taking 280ms per page. Not alarming. We rewrote a few heavy functions. No change. We checked opcache — everything enabled. Added CDN for static assets. LCP dropped to 3.8s.

One line in my.cnf fixed what three months of PHP profiling couldn't show us.

Why the PHP profiler missed it

This is the thing I should have understood earlier.

A PHP profiler (xDebug, Blackfire, Tideways) measures time inside the PHP process. It sees that CIBlockElement::GetList takes 280ms. What it doesn't show is that 240ms of those 280ms is waiting for MySQL to respond. One aggregate number. No breakdown by layer.

If the bottleneck is in the database — not in PHP — the profiler shows you the symptom. We spent three months treating the symptom.

slow_query_log: one config line

MySQL's slow_query_log records every query that exceeds a configurable time threshold, along with execution time, rows examined, and whether indexes were used. It's built into MySQL and requires no additional tools.

In my.cnf (or /etc/mysql/mysql.conf.d/mysqld.cnf depending on your distro):

slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1
log_queries_not_using_indexes = 1

long_query_time = 1 logs everything over 1 second. On a busy production server you can set it to 2 to reduce disk I/O. log_queries_not_using_indexes = 1 is useful separately — it catches fast queries that don't use indexes before they become slow queries under load.

No restart required. SET GLOBAL slow_query_log = 1 works live. We enabled it overnight and read the log in the morning.

After 8 hours on a catalog with ~2,000 concurrent hourly users: 340MB log file. Normal.

What we found

Raw slow log is hard to read at scale. I used pt-query-digest from Percona Toolkit:

pt-query-digest /var/log/mysql/slow.log > digest.txt

It groups similar queries, calculates total execution time over the period, and ranks them. Our top 14 queries accounted for 94% of total slow query time.

The top entry: 3.2 seconds average execution time, 4,800 runs overnight. That's 4.3 hours MySQL spent on a single query pattern.

The query looked like this:

SELECT BE.ID, BE.NAME, ...
FROM b_iblock_element BE
LEFT JOIN b_iblock_element_property EP1 ON EP1.IBLOCK_ELEMENT_ID = BE.ID AND EP1.IBLOCK_PROPERTY_ID = 113
LEFT JOIN b_iblock_element_property EP2 ON EP2.IBLOCK_ELEMENT_ID = BE.ID AND EP2.IBLOCK_PROPERTY_ID = 117
LEFT JOIN b_iblock_element_property EP3 ON EP3.IBLOCK_ELEMENT_ID = BE.ID AND EP3.IBLOCK_PROPERTY_ID = 122
WHERE EP1.VALUE = 'Y' AND EP2.VALUE = '48' AND EP3.VALUE LIKE '%red%'
ORDER BY BE.SORT
LIMIT 40 OFFSET 0

Three JOINs to b_iblock_element_property — one per filter property. On a 28k SKU catalog with thousands of users per hour, this doesn't scale.

How Bitrix D7 ORM generates this

The PHP code that produced this query was completely ordinary:

$result = \Bitrix\Iblock\ElementTable::getList([
    'filter' => [
        '=PROPERTY_IN_STOCK.VALUE' => 'Y',
        '=PROPERTY_SIZE.VALUE' => '48',
        '%=PROPERTY_COLOR.VALUE' => 'red',
    ],
    'select' => ['ID', 'NAME', 'IBLOCK_SECTION_ID'],
    'order' => ['SORT' => 'ASC'],
    'limit' => 40,
]);

Three properties in the filter → three JOINs. The LIKE '%red%' with a leading wildcard makes the index useless — MySQL reads the full joined table.

This isn't a Bitrix bug. It's how EAV storage works. The slow log shows it immediately. The PHP profiler doesn't.

Patterns that are always slow

After a week working through the digest file, a few patterns kept appearing.

LIKE '%string%' on PROPERTY_* fields is the most common one. A leading wildcard kills the index, and MySQL scans everything. For text-based filtering, we moved to Elasticsearch — it was already there for search. For exact-value properties like in_stock or size, we migrated them to a separate high-load properties table.

ORDER BY SORT without a composite index was another one. The query sorted all 28k rows, then returned 40. The fix was a migration adding (IBLOCK_ID, SORT, ID) — no code changes. The query time went from 1.8s to 0.09s.

N+1 hidden in ORM is harder to spot. getList returns 40 elements, then a loop calls GetProperty for each one separately. In the slow log this looks like 40+ identical short queries at 50-80ms each. That's 2-3 seconds. The fix: add the needed properties to the select of the main query.

Results

Three days of changes.

The PROPERTY JOIN query: moved in_stock and size to high-load properties, routed color filtering through Elasticsearch. Query time: 3.2s → 0.18s.

Sorting: composite index migration, no code changes. Time: 1.8s → 0.09s.

The N+1 on the product page took longer — refactoring one section of the product page. Was 47 queries, became 8. PHP time: 680ms → 190ms.

Catalog LCP a week later: 4.1s → 1.9s. Not perfect — CDN and image optimization is a separate chapter.

What didn't work well

pt-query-digest groups similar queries well, but when the same underlying problem appears in queries with different property sets, it creates separate groups. We had to look beyond the top 14 to find them all.

log_queries_not_using_indexes = 1 also produces a lot of noise from Bitrix internal queries. You'll want to filter by database and exclude b_agent, b_option, b_session. The --filter flag in pt-query-digest handles this cleanly.

Bottom line

Three months, a PHP profiler, four hypotheses. Then one day with slow_query_log.

I'm not saying always enable it. On a small project it's noise. But if LCP isn't improving, the profiler says everything's fine, and users keep complaining — the problem is probably not in PHP.

If you want to work through PHP-level issues first, I covered OPcache configuration for Bitrix in production here. The broader patterns that make Bitrix slow regardless of database — here.