Reranking with an Elasticsearch-hosted cross-encoder from HuggingFace

Learn how to use a model from Hugging Face to host and perform semantic-reranking in Elasticsearch.

In this short blog, I’ll show you how to use a model from Hugging Face to perform semantic reranking in your own Elasticsearch cluster at search time. We will download the model using Eland, load a dataset from Hugging Face, and perform sample queries using retrievers, all in a Jupyter notebook.

Overview

If you are unfamiliar with Semantic Text, check out these resources:

  • What it is
  • Why you would want to use it
  • How to create an inference API and connect it to an external service
  • How to use a retriever query for re-ranking

Please review the following links:

The code in this blog and accompanying notebook will also get you started, but we aren’t going to go in-depth on the what and why.

Also, note that I’ll show code snippets below, but the best way to do this yourself is to follow the accompanying notebook.

Step zero

I will also assume you have an Elasticsearch cluster or serverless project you will use for this guide. If not, head on over to cloud.elastic.co and sign up for a free trial! You'll need a Cloud ID and Elasticsearch API Key.

I’ll wait...

Model selection

The first (real) step is choosing a model to use for re-ranking. A deep discussion of selecting a model and evaluating results is outside the scope of this blog. Know that, for now, Elasticsearch only supports cross-encoder models.

While not directly covering model selection, the following blogs give a good overview of evaluating search relevance.

For the guide, we are going to use the cross-encoder/ms-marco-MiniLM-L-6-v2. This model used the MS Marco dataset for retrieval and re-ranking.

Model loading

To load an NLP model from Hugging Face into Elasticsearch, you will use the Eland Python Library.

Eland is Elastic's Python library for data frame analytics and loading supervised and NLP models into Elasticsearch. It offers a familiar Pandas-compatible API.

The code below is from the notebook section "Hugging Face Reranking Model."

model_id = "cross-encoder/ms-marco-MiniLM-L-6-v2"

cloud_id = "my_super_cloud_id"
api_key = "my_super_secred_api_key!"

!eland_import_hub_model \
--cloud-id $cloud_id \
--es-api-key $api_key \
--hub-model-id $model_id \
--task-type text_similarity

Eland doesn’t have a specific `rerank` task type; we use the text_similarity type to load the model.

This step will download the model locally where your code is running, split it apart, and load it into your Elasticsearch cluster.

Cut to

In the notebook, you can follow along to set up your cluster to run the re-ranking query in the next section. The setup steps after downloading the model shown in the notebook are:

  • Create an Inference Endpoint with the rerank task
    • This will also deploy our re-ranking model on Elasticsearch machine learning nodes
  • Create an index mapping
  • Download a dataset from Hugging Face - CShorten/ML-ArXiv-Papers
  • Index the data into Elasticsearch

Re-rank time!

With everything set up, we can query using the text_similarity_reranker retriever. The text similarity reranker is a two-stage reranker. This means that the specified retrievers are run first, and then those results are passed to the second re-ranking stage.

Example from the notebook:

query = "sparse vector embedding"

# Query with Semantic Reranker
response_reranked = es.search(
    index="arxiv-papers-lexical",
    body={
      "size": 10,
      "retriever": {
        "text_similarity_reranker": {
          "retriever": {
            "standard": {
              "query": {
                "match": {
                  "title": query
                }
              }
            }
          },
        "field": "abstract",
        "inference_id": "semantic-reranking",
        "inference_text": query,
        "rank_window_size": 100
      }
    },
    "fields": [
      "title", 
      "abstract"
    ], 
    "_source": False
    }
)

The parameters for the text_similarity_reranker above are:

  • `retriever - Here, we do a simple match query with a standard retriever for lexical first-stage retrieval. You can also use a knn retriever or an rrf retriever here.
  • field - The field from the first-stage results the re-ranking model will use for similarity comparisons.
  • inference_id - The ID of the inference service to use for re-ranking. Here, we are using the model we loaded earlier.
  • inference_text - The string to use for the similarity ranking
  • rank_window_size - The number of top documents from the first stage the model will consider.

You may wonder why `rank_window_size` is set to 100, even though you might ultimately want only the top 10 results.

In a two-stage search setup, the initial lexical search provides a broad set of documents for the semantic re-ranker to evaluate. Returning a larger set of 100 results increases the chances that relevant documents are available for the semantic re-ranker to identify and reorder based on semantic content, not just lexical matches. This approach compensates for the lexical search's limitations in capturing nuanced meaning, allowing the semantic model to sift through a broader range of possibilities.

However, finding the right `rank_window_size` is a balance. While a larger candidate set improves accuracy, it may also increase resource demands, so some tuning is necessary to achieve an optimal trade-off between recall and resources.

Comparison

While I’m not going to provide an in-depth analysis of the results in this short guide, What may be of general interest is to look at the top 5 results from a standard lexical match query and the results from the re-ranked query above.

This dataset contains a subset of ArXiv papers about Machine Learning. The results listed are the titles of the papers.

The “Scored Results” are the top 10 results using a standard retriever

The “Reranked Results” are the top 10 results after re-ranking

Scored ResultsReranked Results
0Compact Speaker Embedding: lrx-vector Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction
1Quantum Sparse Support Vector Machines Spaceland Embedding of Sparse Stochastic Graphs
2Sparse Support Vector Infinite Push Elliptical Ordinal Embedding
3The Sparse Vector Technique, Revisited Minimum-Distortion Embedding
4L-Vector: Neural Label Embedding for Domain Adaptation Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms
5Spaceland Embedding of Sparse Stochastic Graphs Interpolated Discretized Embedding of Single Vectors and Vector Pairs for Classification, Metric Learning and Distance Approximation
6Sparse Signal Recovery in the Presence of Intra-Vector and Inter-Vector Correlation Attention Word Embedding
7Stable Sparse Subspace Embedding for Dimensionality Reduction Binary Speaker Embedding
8Auto-weighted Mutli-view Sparse Reconstructive Embedding NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization
9Embedding Words in Non-Vector Space with Unsupervised Graph Learning Estimating Vector Fields on Manifolds and the Embedding of Directed Graphs

Your turn

Hopefully, you see how easy it is to incorporate a re-ranking model from Hugging Face into Elasticsearch so you can start re-ranking. While this isn't the only re-ranking option, it can be helpful when you are running air-gapped, don't have access to an external re-ranking service, wants to control costs or have a model that works particularly well for your dataset.

If you haven't clicked on one of the many links to the accompanying notebook, now's the time!

Ready to try this out on your own? Start a free trial.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself