Back to blog

Relevance isn't revenue: teaching Elasticsearch to rank by business metrics

We ran an A/B test. The control group ranked search results by text relevance. The test group used a scoring function that factored in margin, stock levels, and sell-through rate. Click-through was nearly identical. Revenue per search session was 12% higher in the business-ranked variant.

Elasticsearch supports this out of the box. Most teams just don't configure it this way.

Not because it's hard. Because you have to understand the business logic before you touch a query.

The problem: relevant search isn't profitable search

Standard search optimizes for one thing: how closely a document matches a query. BM25, TF-IDF — it's all about words. A user searches "Nike sneakers" and gets the list where "Nike sneakers" appears most frequently in the name and description.

But the business wants something different. It wants to surface products that:

  • are actually in stock (not "order on request," which most shoppers won't wait for);
  • have higher margin (not the ones that arrived as clearance with minimal markup);
  • sell well (high velocity means less capital tied up in inventory).

Text relevance doesn't know any of this. It shouldn't — that's not its job.

The engineer's job is connecting the two.

What function_score does

function_score is a wrapper around a standard Elasticsearch query. It takes relevance as a base, adds a numeric boost from document fields or scripts, and combines them using a formula you control.

Basic structure:

{
  "query": {
    "function_score": {
      "query": { "match": { "name": "nike sneakers" } },
      "functions": [
        {
          "field_value_factor": {
            "field": "margin_pct",
            "factor": 0.3,
            "modifier": "sqrt",
            "missing": 0
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

boost_mode: multiply means final score = text relevance × boost. This matters: if a product has zero relevance, no amount of margin boost will push it into results.

Three business signals worth using

On a 28k-SKU catalog, we tested several combinations. Three signals produced consistent gains without wrecking UX.

First — margin. A numeric field margin_pct (margin percentage). We populate it during indexing from the ERP. We used field_value_factor with modifier: log1p — this smooths the gap between a 5% and 50% margin product, avoiding extreme boosts on outliers.

Second — stock level. Field stock_qty. We didn't use it as a continuous signal. Instead, we applied gauss decay: zero-stock items get a penalty, 1-3 units get a small boost, over 10 — maximum. Items with no stock aren't dropped entirely, just ranked lower.

Third — sell-through rate. Field orders_30d — the number of orders in the past 30 days. Fast-moving inventory means less capital locked in the warehouse. It also acts as a proxy signal for product quality and listing quality.

The functions block we ended up with:

"functions": [
  {
    "field_value_factor": {
      "field": "margin_pct",
      "factor": 0.2,
      "modifier": "log1p"
    }
  },
  {
    "gauss": {
      "stock_qty": {
        "origin": 20,
        "scale": 15,
        "offset": 5,
        "decay": 0.5
      }
    }
  },
  {
    "field_value_factor": {
      "field": "orders_30d",
      "factor": 0.3,
      "modifier": "sqrt",
      "missing": 0
    }
  }
],
"score_mode": "sum"

We adjusted factor weights through A/B testing: started equal, then reduced the margin weight after noticing that expensive, slow-selling items were rising too high.

How to get business data into the index

This is an architecture decision, and it's worth making deliberately.

Option one: denormalize at index time. When a product is indexed, your script pulls margin, stock, and order counts from the ERP and writes them directly into the document. Simple sync, predictable schema. The downside — data goes stale. Stock changes with every sale, and a full reindex of 28k SKUs takes time.

Option two: partial updates via the update API. When stock changes in the ERP, a sync agent calls POST /<index>/_update/<id> with only stock_qty and orders_30d — everything else stays untouched. Near-real-time freshness.

We use option two for stock levels (changes frequently) and option one for margin (recalculated once per day during ERP sync).

Decay functions: how not to break UX for the sake of margin

The main risk: aggressive margin boosting surfaces products the user didn't want. "Tablecloth with the highest markup" isn't what you expect when you search "Nike sneakers."

Two rules we hold without exception.

First: boost_mode: multiply, not replace. Boost multiplies relevance — it doesn't replace it. A document with zero text relevance stays at zero regardless of its margin.

Second: normalize function weights so their combined contribution doesn't exceed 40-50% of the final score. The rest stays pure text relevance. It's a rough rule, but it keeps the balance: search stays smart rather than becoming a sales floor.

When you see skew — gauss decay is safer than raw field_value_factor. It smoothly reduces the boost as you move away from the "ideal" value, instead of dropping off hard.

A/B test: how to know it's working

On 28k SKUs, an A/B test on search ranking reaches statistical significance quickly — at 500+ daily search sessions, a week or two is enough.

Four metrics we tracked:

  • First-result CTR — shouldn't drop sharply. A sharp drop means the boost is too aggressive.
  • Revenue per search session — the main number.
  • Average order value — sometimes rises, because high-margin products tend to cost more.
  • OOS rate in top-5 — should fall after adding the stock gauss decay.

After two weeks: revenue per session +12%, OOS in top-5 dropped from 18% to 4%, first-result CTR fell by 1.3 percentage points (acceptable — fewer clicks, but higher conversion on each).

We tracked it via Kibana by adding a named query with a group marker to each request ("_name": "boosted" vs "_name": "control"), then logging clicks and conversions in the app layer.

What doesn't work

Three limitations to know before you ship.

ERP margin vs real margin. The ERP usually shows planned margin, not actual (after discounts, shipping, returns). If you boost on that, high-markup-but-unprofitable products float to the top.

Seasonal products break orders_30d. A winter jacket in October has high sell-through. In April — zero. Without seasonal weighting or a season flag, business boosting works against you during off-peak.

Long-tail catalogs. If 80% of SKUs have zero orders in the past 30 days (new arrivals, niche products), orders_30d as a signal barely fires. You'll need fallbacks — for example, a recency boost via created_at for new listings.

Business boosting isn't a fix-it button. It's an additional logic layer on top of search — one that needs maintenance and weight review at least once a quarter.

The bottom line

Elasticsearch can rank by business metrics. function_score with field_value_factor and gauss decay is standard — the documentation is solid. The hard part isn't the query syntax. It's understanding which metrics actually move the business, how to get them into the index, and how to weigh them against text relevance.

In our case: 3 days to the first working version, 2 weeks of A/B testing. A 12% lift in revenue per search session — at our traffic volumes, that's visible on the following Monday's dashboard.

Relevance is good. Revenue is better.


Related reading: Elasticsearch is not a fast database — it's a UX tool and 3,000 Results Is Also Broken Search.