Ecommerce Search Relevance: Why Too Many Results Kill Conversion

Zero results is a red flag in the dashboard. Obvious. Gets filed as a bug.

Three thousand results is a green flag. "Search is working."

And that's what kills conversion.

I worked on a catalog with 28,000 SKUs. A search for "men's jacket" returned 1,400 results. The filters were there. Pagination was there. Conversion from search was lower than from direct category navigation. Users typed a query, got a wall of products, and left without buying. Because nobody taught the search to answer a question. They only taught it not to fail.

3,000 results isn't a win. It's a different kind of failure.

The zero-results metric became standard before "too many results" was even considered a problem. Understandable — an empty screen is obviously broken.

Three thousand results is a silent failure. The user found *something*. Just not what they came for.

In behavioral data, it looks like this: user searches, sees a long list, clicks on the first screen, doesn't find what they want, leaves. CTR on the first 10 results with good relevance runs around 60–70%. With poor relevance, it drops to 20–30%. The other 70–80% of results exist only as a number.

The conversion gap between "found it on the first screen" and "didn't" is roughly 3–5x. Not theory — logs.

Why default sort by "date added" destroys your search intent signal

Most CMS platforms and PHP ORMs sort results by creation date by default. In Bitrix, that's ORDER BY id DESC. For a 200-product store, it's invisible. For a 28,000-SKU catalog, it means a search for "boots" returns 640 results where the first 20 are whatever was added last. The newest addition isn't necessarily what the user wants.

Diagnosing it is simple: open your search, run any broad query, look at the top 10 results. If those aren't your best-selling products for that query, the sort is wrong.

The first step before touching any scoring is to change the default sort from chronological to something meaningful: orders in the last 30 days, in-stock items first, or margin score if you have it. One hour of work. In some catalogs, that alone lifts search conversion by 10–15%.

Elasticsearch scoring on a real catalog: what actually moves conversion

Elasticsearch uses BM25 by default — a relevance algorithm that scores documents by term frequency and inverse document frequency. For text search, it works well. For product catalogs, it's missing half the picture.

BM25 doesn't know a product is in stock. It doesn't know it outsells similar items 10 to 1. It only knows text.

For catalogs, you need function_score. It's a wrapper on top of BM25 that lets you blend numeric signals into text relevance.

Here's what we applied to the 28k SKU catalog:

stock_quantity > 0 — multiplier 1.5. In-stock items surface first.
orders_30d — field_value_factor, weight 0.3. Popular items get a boost.
margin_score — field_value_factor, weight 0.2. Higher-margin items rank slightly higher.

Before: "men's jacket," 1,400 results, top 10 filled with new arrivals with no sales. After: still 1,400 results, but the top 10 were bestsellers in stock. First-screen CTR moved from 28% to 51%.

The result set didn't shrink. The order changed. Scoring doesn't remove results, it reorders them. Users still see the same count, but what leads is likely what they want.

Pre-selecting facets from query signals: cutting the result set before the user does

Scoring surfaces the right items. But sometimes you need fewer results, not just better-ordered ones.

For wide categories — apparel, electronics, home goods — 1,000+ results per query isn't unusual even with good ranking. This is where pre-selection helps: automatically activating one or two facets based on signals in the query.

Example: user searches "men's winter boots." Signals present — gender: male, season: winter. If the search can read those, it can apply the gender:male and season:winter filters automatically, without requiring the user to click anything.

The implementation is a dictionary: keyword → facet value. "Men's" → gender:male. "Winter" → season:winter. On match, the facet activates, and the result set drops to 80–200 items instead of 640.

In that project, auto-facet pre-selection on queries with explicit signals moved add-to-cart from search up by 8–14%. Users didn't have to work through filtering before finding something relevant.

It's not ML or AI. It's a 50–100-entry dictionary, built in two days. The companion piece on reordering facets by usage data covers the UI side of this.

What we measured to prove 200 results beats 2,000

The most common mistake when evaluating search is tracking "found / didn't find" as a binary. It hides quality.

Four metrics that showed what was actually happening.

First-screen CTR: what share of users clicked one of the first 10 results. With good ranking, that's 55–65%. Below 35%, scoring needs attention.

Time-to-click: seconds from results appearing to first click. If it rises while CTR stays low, users aren't browsing — they don't know what to pick.

Search-to-add-to-cart: sessions with a search query that ended in an add-to-cart. Our case: 4.1% before, 6.8% after scoring and pre-selection. That's the one number worth watching.

Abandonment: searched and left without clicking anything. When this grows alongside result count, the list is too long to be useful.

Zero results as a separate product signal — we covered that process in detail separately. Different problem, different tooling.

The short version

Three thousand results means the search didn't make a decision. The user came for an answer, got a task, and left.

You can start with the easiest change: swap ORDER BY id DESC for ORDER BY orders_30d DESC. An hour of work. Relevance scoring and query-signal facet pre-selection come after.

The rest comes next. If you're still thinking about Elasticsearch as a speed tool rather than a UX tool, that framing is worth revisiting first.