Elasticsearch zero downtime reindex: alias swap in production

We needed to add a Russian morphology analyzer to an index with 80,000 products. I had no plan for how to do it live.

Elasticsearch doesn't let you change mapping on a running index. A field you created as keyword stays keyword. Want to add an analyzer? Create a new index. This is documented, immutable, and not negotiable.

We could have scheduled downtime — middle of the night, minimal traffic. But our 1C import ran from 23:00 to 02:00, pushing 30,000 stock updates. Taking down search during that window meant corrupted data or lost changes.

We figured out alias swap. Here's how it actually works, including the part the docs skip over.

Why you can't change an Elasticsearch mapping in place

Mapping is the index schema. Once you've indexed your first document, the field structure is locked. You can add new fields, but you can't change the type of an existing field or reconfigure its analyzer.

The reason is architectural: Lucene (the engine under Elasticsearch) builds an inverted index and term vectors tuned to a specific analyzer at index time. Change the analyzer, and your 80k existing documents contain "old" tokens. New documents get analyzed differently than old ones, and search results become inconsistent across your catalog.

That's what happened to us. We added the russian analyzer to name and description. Products from that morning's import were findable. Anything indexed weeks earlier wasn't coming up on Russian queries. One customer couldn't find "телевизор" for items we definitely had in stock.

There's only one real fix: reindex everything. The question is how to do it while the store keeps running.

Aliases: the index abstraction that makes zero-downtime possible

An alias is a named pointer to one or more indices. Instead of your application talking directly to catalog_v1, it talks to catalog. You can then atomically switch that alias to point at catalog_v2 with a single API call.

That's the entire foundation of a zero-downtime migration. Your application doesn't know which physical index it's reading from — it just knows the alias name.

If you haven't set up aliases yet, you'll need one code change before starting. In our case the Bitrix REST adapter had the index name hardcoded. We switched it to the alias before the migration. That was the only application change required.

Four steps: new index → reindex → alias swap → verify

Step 1. Create a new index with the updated mapping.

Name it with a version suffix: catalog_v2. Include the full mapping — all fields, the new analyzers, and settings for shards and replicas. Verify tokenization works as expected on a test document before proceeding.

Step 2. Run the _reindex API.

POST /_reindex
{
  "source": { "index": "catalog_v1" },
  "dest": { "index": "catalog_v2" }
}

By default it's synchronous. For 80k documents it took us 4 minutes 12 seconds on a single node (4 vCPU, 16 GB RAM). The store and search stayed up throughout — they kept reading from catalog_v1 via the alias.

For larger indices, use ?wait_for_completion=false and track progress with the returned task_id.

Step 3. Atomic alias swap.

POST /_aliases
{
  "actions": [
    { "remove": { "index": "catalog_v1", "alias": "catalog" } },
    { "add":    { "index": "catalog_v2", "alias": "catalog" } }
  ]
}

Elasticsearch executes both actions atomically. There's no moment where the alias points at nothing. For users mid-session, the gap is under 1ms.

Step 4. Verify.

Run a few test searches on known terms. Check that _cat/indices?v shows both indices and the alias points to catalog_v2. Watch your error logs for a few minutes.

The write-during-reindex problem

This is the part most tutorials skip.

While the reindex is running, your catalog keeps getting updates. In our case, 1C pushes stock updates every 15 minutes. If you only write to catalog_v1 during the reindex, catalog_v2 gets a snapshot from when the reindex started — you lose every update that happened in those 4 minutes.

The fix: write to both indices during migration.

In our Bitrix adapter we added a dual_write mode:

if ($this->isDualWriteMode()) {
    $this->indexDocument($doc, 'catalog_v1');
    $this->indexDocument($doc, 'catalog_v2');
}

We enabled the flag manually before starting the reindex and turned it off about 10 minutes after the alias swap — once we confirmed catalog_v1 was no longer receiving reads.

The overhead from writing twice was about 18% slower indexing throughput. Acceptable for a 15-minute window.

After disabling dual_write, we kept catalog_v1 alive for a week before deleting it — giving ourselves room to roll back if something came up.

Estimating reindex time and monitoring progress

Synchronous reindex doesn't display progress. You can poll document count in catalog_v2 directly:

curl -s localhost:9200/catalog_v2/_count | jq '.count'

Rough time formula: (document_count × analyzer_complexity_factor) / reindex_throughput

On our hardware: simple mapping ran at ~2,000 doc/s. With heavy analysis (russian morphology + normalizer + synonyms), throughput dropped to ~320 doc/s. At 80k documents that's about 250 seconds. That matched what we saw.

For async reindex (wait_for_completion=false), track via the Tasks API:

GET /_tasks/<task_id>

Rollback in 30 seconds

If something goes wrong after the alias swap, the fix is one API call:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "catalog_v2", "alias": "catalog" } },
    { "add":    { "index": "catalog_v1", "alias": "catalog" } }
  ]
}

We used this 20 minutes after our first migration attempt — not because search broke, but because we found some documents missing from catalog_v2 due to a reindex timeout on one shard. We swapped the alias back, fixed the shard issue, and ran the migration again three hours later.

Don't delete the old index immediately. Give it a few days.

One thing we didn't account for

After the alias swap, search started finding more relevant results. The morphology worked. But users who had an active session at the moment of the switch saw inconsistent results: their browser cache had old data while the new index returned different rankings.

We hadn't cleared the Redis search result cache during the alias swap. In practice, one user sent a support chat message: "just searched and found nothing, now I found it, what's going on?" One ticket in 20 minutes. Not a disaster, but now search cache invalidation is a required step in our alias swap procedure.

Alias swap is the only reliable way to update Elasticsearch mapping without stopping your store. The API is simple. The difficulty is keeping track of every write path and knowing exactly when to cut over.

If you have continuous writes from 1C or any other source, enable dual write before you start the reindex. That's the only place where you can lose data — and it's not inside Elasticsearch, it's in your application logic.

Previous part of this series — on designing your mapping so you need this procedure less often: Elasticsearch Mapping Is Architecture, Not Configuration.

What drove the need for morphology in the first place: "Socks" and "knee-highs" are the same product. Elasticsearch didn't know..