Elasticsearch Fuzzy Search Ecommerce — Typo Tolerance Setup

The sale that never happened

One of the first queries after we launched new search on a 28,000 SKU catalog: "samsunk phone."

Not "samsung." Not "samsung phone." Just "samsunk" — one finger slipped from "g" to "k." The search returned zero results. The user saw nothing. They left. The client never found out.

I found it nine days later in the zero-results log. Before anyone complained. That's the thing about typos in search: nobody reports them. A frustrated user doesn't write to support saying "your search doesn't handle typos." They just close the tab. Silent exit. In analytics, it shows up as a session without a purchase. No red flag. No ticket.

This is why you have to hunt for typos yourself. Don't wait for complaints.

Reading zero-results logs as typo diagnostics

If you're running Elasticsearch, logging zero-results queries is non-negotiable. I wrote about using those logs as a product backlog signal. This is the same log, different angle: reading it specifically for typo patterns.

After we enabled detailed query logging, we had roughly 1,800 zero-results queries in the first two weeks. I went through the first 200. Three patterns stood out:

Single-character typos: "lapyop" instead of "laptop", "samsunk" instead of "samsung", "telivisor" (Russian: "телевзор" instead of "телевизор").
Wrong keyboard layout: typing in Latin when the keyboard is set to Cyrillic (or vice versa). This is huge in Russian-language stores — more on this below.
Merged words and missing letters: "wirelessheadphones" (one word), "headphone" instead of "headphones".

About 40% of zero-results queries in that sample were typo-related. That's not theoretical lost revenue. That's real sessions we didn't convert. For a store with typical search-to-cart conversion of 8–12%, that compounds quickly.

How fuzziness works in Elasticsearch

Elasticsearch handles typo tolerance through the fuzziness parameter. It's measured in edit distance — the number of single-character operations (insert, delete, substitute, transposition) needed to turn one string into another.

Two ways to set it:

Fixed edit distance — fuzziness: 1 or fuzziness: 2. Predictable, but applies the same tolerance to both short and long queries. With fuzziness: 1 on "ok", you'll match "ok", "ox", "ob", "of" — which produces noise.

AUTO (what we use in production) — fuzziness: AUTO applies different edit distances based on query length:

1–2 characters: exact match only
3–5 characters: 1 edit allowed
6+ characters: 2 edits allowed

For an ecommerce catalog, AUTO works well out of the box. Short queries stay precise. Longer product names get the flexibility they need.

One more parameter: prefix_length. This forces the first N characters to match exactly before fuzziness kicks in. We set prefix_length: 2 — meaning the query and the match must share the same two-character prefix. Reduces false positives and speeds up queries slightly.

Our simplified query config:

{
  "multi_match": {
    "query": "samsunk phone",
    "fields": ["name^3", "brand^2", "description"],
    "fuzziness": "AUTO",
    "prefix_length": 2
  }
}

With this, "samsunk" matches "samsung" via one substitution. Results appear.

The Russian keyboard problem

This one is specific to Russian-language ecommerce, and almost nobody documents it in the Elasticsearch literature.

Russian users frequently type with the wrong keyboard layout active. A user searching for "принтер" (printer) with their keyboard accidentally set to Latin layout types "ghbyntk" — a completely different character set. Fuzziness doesn't help here. This isn't a one-letter typo. It's a fully different string.

We handled this in a preprocessing layer before the Elasticsearch query. The PHP script checks whether the input looks like a transliteration pattern. If yes, it tries to convert it. If the conversion produces a plausible word, we send both variants in an OR query.

This isn't an Elasticsearch feature. It's a PHP layer upstream of the search. But without it, fuzzy matching alone fixes maybe 60% of the typo problem, not 90.

What changed after we turned it on

We enabled fuzziness: AUTO + prefix_length: 2 + keyboard layout preprocessing in stages, to separate the effects.

After fuzzy matching:

Zero-results rate dropped from 22% to 14%.
Search conversion (click on result → add to cart) went from 6.1% to 7.8%.

After keyboard layout preprocessing:

Zero-results rate: 11%.
Search conversion: 8.3%.

Four weeks total. Zero-results cut in half. Search conversion up 36% from baseline. On a store with a few thousand orders a month, that's real money without touching a single product page.

We didn't wait for a complaint. We fixed it before the client noticed.

Where fuzzy matching breaks

It's not a general solution. A few places we got burned.

Too aggressive edit distance. With fuzziness: 2 and no prefix_length, a query for "juice" starts matching "juice", "moose", "goose", and whatever else shares two characters. The result is noise. prefix_length: 2 helps, but for very short queries (under 4 characters) you're better off disabling fuzziness entirely.

SKUs and article numbers. A user searching "AB-12345" wants exactly that product. Fuzzy matching turns this into a disaster — it returns products with similar-looking codes. We disabled fuzziness on sku and article fields completely.

Performance. Fuzzy queries are more expensive than exact queries. On our index, the difference was roughly 40ms vs. 12ms at the median. Acceptable for 28,000 SKUs. For a million-item catalog, you'd want a different approach — the suggest API, a separate spelling correction pass, or a dedicated search cluster.

Takeaway

Elasticsearch fuzzy search isn't about making search "smarter." It's about the money that leaves silently. Typos don't generate support tickets. They generate closed tabs.

The fix isn't complex: fuzziness: AUTO, prefix_length: 2, a preprocessing step for keyboard layout. But it starts with auditing your zero-results log regularly. Without that, you don't know what you're losing.

More on Elasticsearch as a UX tool — the broader framing behind these decisions.