\"Socks\" and \"knee-highs\" are the same product. Elasticsearch didn't know.
Three weeks of Elasticsearch setup (shards, replicas, mappings, custom tokenizers) got us to where fuzziness AUTO dropped zero-results from 22% to 11%. A real improvement. But 11% of searchers were still leaving empty-handed. I pulled the logs and found the problem wasn't typos. It was synonyms.
"USB-C cable" didn't match "charger." "Mouse" didn't match "pointing device." "Buckwheat" and "buckwheat groats" were the same physical product, but two separate universes in the index. Fuzzy search doesn't fix that. Morphology doesn't fix that. You need a dictionary.
Four synonym failure classes in Russian e-commerce
We have a 28,000 SKU catalog on Bitrix. I pulled the 340 most frequent zero-results queries from a single week and categorized them.
First: brand names in phonetic Russian spelling. "Ayfon" instead of iPhone, "Samsunk" instead of Samsung, "Ksyaomi" instead of Xiaomi. Russian speakers write how they talk, and Cyrillic phonetics don't always align with the official brand name.
Second: abbreviations and technical variants. "USB-C," "type-c," "BT" for Bluetooth, "SD" for memory card. One physical connector, a dozen spellings.
Third: functional synonyms. "Charger" and "charging cable," "headphones" and "headset," "printer" and "MFP." Not exactly the same thing — but usually what the customer is looking for is one of the two.
Fourth: professional vs. colloquial. "TV stand" vs. "television cabinet," "office chair" vs. "computer chair," "socks" vs. "knee-highs." That last pair is where I got the title for this post.
Out of 340 queries we built 180 synonym pairs — about four hours of work with the log data as a starting point.
Where the dictionary comes from
The biggest mistake is trying to build a synonym dictionary from memory. No developer knows how a customer phrases a query at 2am on their phone.
The right source: _msearch logs, or a custom PHP logging wrapper. Specifically, requests where hits.total.value == 0.
GET /logs-search-*/_search
{
"query": {
"term": { "hits_count": 0 }
},
"aggs": {
"top_queries": {
"terms": { "field": "query.keyword", "size": 500 }
}
}
}
Export to CSV, sort by frequency, manually assign synonyms. You don't need 10,000 rows. Our 180 pairs had measurable impact within the first week.
I keep the dictionary in a file on the server — not in code, not in a database. Elasticsearch can reload synonym_graph without a full index restart using update_settings plus /_close and /_open.
Expand vs. contraction: which strategy fits
This is a choice most teams skip because the documentation describes it technically without business context.
With expand (socks, knee-highs, stockings => socks, knee-highs, stockings), every term maps to all the others mutually. A search for "socks" returns "knee-highs" and "stockings" too. Recall goes up, precision goes down.
With contraction (socks, knee-highs, stockings => hosiery), all variants map to a canonical term. Recall is lower, but relevance is higher. The customer won't see products they didn't intend to find.
We use a hybrid: expand for brand names and technical variants ("usb-c, type-c, type c"), contraction for functional synonyms where we want to control what shows up.
One hard constraint: synonym_graph only works correctly at query time (search_analyzer), not at index time. If you include it in index_time, you'll get unpredictable behavior with multi-word tokens.
Analyzer chain order matters
This is where I lost half a day to debugging. Morphology and synonyms both operate on tokens, but their interaction isn't obvious.
For Russian e-commerce, the correct chain in search_analyzer:
"filter": [
"lowercase",
"russian_morphology",
"synonym_graph_filter"
]
Morphology before synonyms, because morphology normalizes word forms to their root. Flip the order and the dictionary stops working for declined or plural queries. "Buckwheat groats" in the genitive case won't match the synonym because — before morphology runs — it's a different token, and the synonym filter has already done its job.
One more thing: use synonym_graph, not synonym, in search_analyzer. The former handles multi-word phrases correctly — "USB type C" as a phrase, not three separate tokens.
What to measure
Three metrics that give you signal fast.
Zero-results rate: how many search sessions ended with nothing. We were at 11% after the fuzzy fix; synonyms brought it to 6%.
Search conversion: share of search sessions that ended in a purchase. We track the chain: search → product_view → order. Went from 8.3% to 9.4%.
Post-search exit rate: share of sessions where the customer left immediately after an empty result set. It should drop.
Run the metrics for two weeks before and two weeks after. Catalogs change, results fluctuate — you need the window.
What not to do
The most common mistake after an early win is building the dictionary too aggressively.
"Buckwheat" and "rice" are both grains but they aren't synonyms. We added a few pairs like that when enthusiasm was high, and got complaints: "I searched for buckwheat, you're showing me rice." Customers know the difference.
A simple check before adding any pair: "Can the customer buy X instead of Y in 90% of cases?" If not, it's a related category, not a synonym. Handle that with facets, not the search index.
Also: don't map competing brands as synonyms. "Lenovo" and "HP" aren't synonyms, even if a customer occasionally confuses them. That's a liability path and it frustrates users who know exactly what they want.
Wrap-up
Elasticsearch searches well for what you've shown it. The problem is that customers don't speak catalog. 180 CSV rows from zero-results logs, built in a few hours, moved our search conversion more than three weeks of shard tuning. Because the dictionary is a translation layer between how your customers talk and how your index thinks.
Start with the zero-results log, pull the top 300 queries from the last week, and manually work through the first 100. It takes less than a day. You'll see the numbers move in two weeks.
*Related: Elasticsearch as a UX tool, what zero-results logs tell you, fuzzy search for ecommerce.*