When Elasticsearch Lies: Keeping Your Search Index Fresh When the Catalog Changes Daily
A customer searched, found what they needed, clicked — and got "out of stock." They left. Later, a support ticket: "your search is broken." I opened the logs. The search wasn't broken. It returned exactly what was in the index an hour ago.
That's the problem.
On a catalog with 28,000 SKUs, prices update every night from 1C. Stock levels change several times a day — sometimes hourly during sales. We spent three months tuning morphology, synonyms, and facets. Then we discovered that 4% of search sessions ended in frustration not because results were irrelevant, but because they were outdated.
This isn't an edge case. It's the last thing people think to fix.
Why the index goes stale even after "proper" setup
In a standard Bitrix architecture, catalog data lives in iblocks. Prices, stock levels, and product attributes are separate entities with separate lifecycles. Stock gets updated through 1C exchange — a file is uploaded, Bitrix parses it, writes to the database. Prices might arrive through a different pipeline. Descriptions and images get edited manually by a product manager.
Elasticsearch doesn't know any of this. It only knows what you sent it — and when.
If Elasticsearch does a full reindex once a night, by noon the next day it's running on yesterday's stock. Two 1C exchanges have already happened in Bitrix. The search says "in stock." The product page says "unavailable."
That gap — between the source of truth and what the customer sees — is a trust problem, not a configuration problem.
Freshness tolerance by data type
Not all index fields are equally critical. This distinction matters, because trying to update everything in near-real-time creates complexity without proportional value.
Stock (in_stock, quantity) — critical. A 15–30 minute lag during sales events causes real damage: orders placed on unavailable items, support calls, refunds.
Price — important. A 1–2 hour lag is acceptable for most catalogs, but not for stores with dynamic pricing or countdown promotions. A wrong price in search results is either lost revenue or a legal problem.
Descriptions, specs, images — not critical. A product manager corrects a typo — the index can catch up within the day, nothing breaks. Scheduled updates are fine here.
That framing lets you design a strategy deliberately instead of "update everything as fast as possible."
Three indexing strategies
Full reindex
The simplest approach. A cron job runs every hour (or more often) and pushes the entire catalog to Elasticsearch. On a 28,000 SKU catalog, a full reindex takes 8–12 minutes under normal load.
The problem: during reindexing, search either runs on old data or is unavailable if you delete and recreate the index. The standard fix is alias switching — keep two indexes, flip the alias only after the new one is fully built. That works, but it's added complexity.
The second problem: resource contention. A full reindex every hour is a constant background process competing with live queries and 1C exchange jobs.
Scheduled batch (timestamp-based delta)
Smarter. Instead of the full catalog, only ship changes from the last N minutes. Bitrix stores TIMESTAMP_X (last modified) on iblock elements. A cron job every 5 minutes fetches items where TIMESTAMP_X > last_run_time and sends them to Elasticsearch.
This is what we use on most projects. Stable, predictable load, typical lag of 5–7 minutes.
One catch: when 1C updates stock levels, Bitrix doesn't always update TIMESTAMP_X on the catalog element. Updates may write directly to price or stock tables, bypassing the main element record. For those cases, you need a separate change tracker on b_catalog_price and b_catalog_store_product.
Event-driven delta
The freshest option. Bitrix event handlers (OnAfterIBlockElementUpdate, 1C exchange hooks) push tasks into a queue — Redis or a simple MySQL table — and a worker processes them with seconds of latency.
Higher implementation complexity. You need to handle duplicate events (one product might fire 5 events in 10 seconds during batch import), race conditions (two events in the queue for the same product — which state is newer?), and worker reliability (what happens if it crashes mid-batch?).
Worth it for critical data: stock levels and prices in competitive markets where a few minutes matters.
Delta indexing for Bitrix in practice
On the 28,000 SKU catalog, we run a hybrid: scheduled batch every 5 minutes for descriptions and attributes, event-driven for stock and prices.
For scheduled batch: a cron task fetches products from Bitrix where DATE_MODIFY > :last_run, builds Elasticsearch documents, and sends them via bulk API using the PHP Elasticsearch client. A batch of 200 changed items takes about 3–4 seconds.
For event-driven: we intercept OnAfterIBlockElementSetPropertyValues (where quantities are written) and OnIBlockElementUpdate. We write iblock_element_id to a queue table with a timestamp. A separate PHP script running under PM2 — a simple worker, not a LangGraph pipeline — picks up batches of 50, fetches current data from the Bitrix API, and writes to Elasticsearch.
Why PM2 and not Redis? Redis wasn't in the infrastructure, and adding it for one worker wasn't worth the overhead.
Failure modes to expect
Race condition during batch import. 1C uploads files sequentially, and a single product can receive three updates in 10 seconds. Three tasks in the queue. The worker processes the first, fetches data from Bitrix — and gets the third (final) state. That's fine, actually. The key is to always fetch current data at processing time, not cache the event payload.
Lost tasks on worker restart. If the worker crashes mid-batch, tasks may get stuck in "in progress" and never return to the queue. Simple fix: add TTL to the in-progress status — if a task isn't closed within 2 minutes, return it to the queue.
Partial updates clobbering data. An Elasticsearch document contains description, price, and stock together. If you update only stock via event-driven and descriptions via scheduled batch, make sure you're using the update API with doc — not replacing the entire document. Otherwise a fresh description update will disappear the next time a stock event fires.
Detecting a lying index
The most reliable method: compare timestamps. Each Elasticsearch document stores an indexed_at field — the last time it was updated in ES. Every 10 minutes, a cron task samples 100 random products, compares indexed_at against DATE_MODIFY in Bitrix. If the gap exceeds a threshold — 30 minutes for descriptions, 15 minutes for prices, 5 minutes for stock — it fires a Telegram alert.
It's not a perfect metric, but it's cheap to run. In six months it caught three incidents before customers noticed.
Getting the index fresh is a second-order problem. The first-order problem is knowing what people search for and fail to find. If you haven't looked at your zero-results log yet, start there — it's faster and often more impactful.