What is Elasticsearch Relevance Engine (ESRE)?

edit

What is Elasticsearch Relevance Engine (ESRE)?

edit

Software developers are increasingly using machine learning models to improve the relevance of data presented to their users. This is particularly true for applications using natural language interfaces, for example: search, question/answer, completion, and chat.

The Elasticsearch Relevance Engine (ESRE) is a collection of tools from Elastic that combines machine learning models, data transformation and storage (including vectors), and data search and retrieval. ESRE also includes tools for data security, and tools to integrate with other software, including various data sources and large language models (LLMs).

Read on to learn about the components of ESRE, or jump directly to Examples for example applications and implementations.

Machine Learning models

edit

Machine learning models enable your applications to understand natural language data and enrich or transform that data (at index time and at query time).

Uses of machine learning models include:

  • Generating vector embeddings
  • Extracting information from unstructured text, such as named entities or the answers to questions
  • Classifying text, such as its language or its sentiment (positive or negative)

To perform these operations, you must deploy one or more trained models.

Elastic provides the following relevant features:

  • The Elastic Sparse Encoder trained model for general purpose semantic search, without fine-tuning
  • Interfaces to deploy and manage third party trained models for vector search and natural language processing
  • Cloud infrastructure on which to deploy these models

Elastic Sparse Encoder model

edit

The Elastic Sparse Encoder model is a machine learning model, built and trained by Elastic, which enables general-purpose semantic search for English language data.

At index time, the Elastic Sparse Encoder model enriches each document with an additional text expansion field that uses weighted tokens to capture the relationships between words and their meanings. At query time, when using a text expansion query, the sparse encoder model applies the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.

The Elastic Sparse Encoder model is a zero shot out of domain machine learning model, which means it does not require additional training or fine-tuning with your own data. Use this Elastic model to get started with semantic search without needing to identify and manage additional models.

Deploy the model, create a pipeline with an inference processor, and ingest (or re-index) your data through the pipeline.

Documentation:

3rd party model management

edit

Many public and private trained models are available to enrich your data, each solving a different problem. For example, Hugging Face catalogs thousands of available models.

To use a third party model with Elastic, you must import and deploy the model, and then create an ingest pipeline with an inference processor to perform data transformation.

Elastic provides the following interfaces to manage the trained models you are using:

  • A Kibana UI at Machine Learning > Model Management > Trained Models
  • Elasticsearch APIs grouped under /_ml/trained_models/
  • The Eland language client, implemented in Python

Documentation: Deploy trained models

Elastic Cloud ML instances

edit

Elastic Cloud includes infrastructure on which to deploy and run trained models. When creating an Elastic Cloud deployment, enable Machine Learning instances, and optionally enable Autoscaling.

Documentation: Set up machine learning features

Data storage

edit

Elastic provides the capabilities to store data of various types, including unstructured text and dense vectors (embeddings). Use Elastic to store your data before and after transformation by a machine learning model.

Indices

edit

Elastic stores data as documents (with fields) within indices, and supports many field types, including dense vectors. Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Use the Index and Document APIs or the Kibana dev tools console to manage data manually. Some ingestion tools, such as the web crawler and connectors, manage indices and documents on your behalf.

Use the Reindex API to re-index data already stored in Elasticsearch, for example, to run the data through a machine learning ingest pipeline.

Documentation:

Vector field types

edit

To store dense vectors for vector search, use the dense vector field type. At query time, use a kNN query to retrieve this information.

To store sparse vectors for text expansion (for example, when using the Elastic Sparse Encoder model), use the sparse vector field type. At query time, use a text expansion query to retrieve this information.

Documentation:

Data transformation

edit

Elastic provides tools to transform your data, regardless of how it is stored (within Elastic or outside).

Ingest pipelines are general purpose transformation tools, while inference processors enable the use of machine learning models within these pipelines.

After deploying machine learning models, use these ingestion tools to apply ML transformations to your data as you index or re-index your documents. Extract text, classify documents, or create embeddings and store this data within additional fields.

Ingest pipelines

edit

An ingest pipeline enables you to "pipe" incoming data through a series of processors that transform the data before storage. Use an ingest pipeline to enrich documents with additional fields, including fields generated by machine learning models. Use an inference processor to employ a trained model within your pipeline.

Documentation: Ingest pipelines

Inference processors

edit

An inference processor is a pipeline task that uses a deployed, trained model to transform incoming data during indexing or re-indexing.

Documentation: Inference processor

Search and retrieval

edit

After using machine learning models to enrich your documents with additional fields or embeddings, choose from a variety of retrieval methods that take advantage of this additional data.

Use Elastic for semantic search with dense vectors (kNN) or sparse vectors (Elastic Sparse Encoder), and combine these results with those from BM25 text search, optionally boosting on additional NLP fields.

Perform any of these retrieval methods through the same API endpoint: _search.

Text (BM25 + NLP)

edit

Use a full-text query to search documents enriched by machine learning models.

Elasticsearch provides a domain-specific language (DSL) for describing a full-text search query. Use this query DSL to design full-text queries, targeting the various fields of your documents.

Depending on your use case, use the additional fields you have added through natural language processing to improve the relevance of your results.

Use the _search API with the query request body parameter to specify a search query using Elasticsearch’s Query DSL. For example, the match query is the standard query for performing a full-text search, including options for fuzzy matching.

Documentation:

Text expansion (Sparse Encoder)

edit

Use a text expansion query to perform semantic search on documents enriched by the Elastic Sparse Encoder model.

At index time, the Elastic Sparse Encoder model enriches each document with an additional text expansion field that uses weighted tokens to capture the relationships between words and their meanings. At query time, the text expansion query uses the sparse encoder model to apply the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.

Use the _search API with the query and query.text_expansion request body parameters to query the text expansion field using the sparse encoder model.

Documentation:

Vector (kNN)

edit

Use a k-nearest neighbor (kNN) search to retrieve documents containing indexed vectors, such as those added through an inference processor.

This type of search finds the k nearest vectors to a query vector, as measured by a similarity metric. You will receive the top n documents that are closest in meaning to the query, sorted by their proximity to the query.

Use the _search API with the knn request body parameter to specify the kNN query to run.

Documentation:

Hybrid (RRF)

edit

Elasticsearch allows you to combine any of the above retrieval methods within a single search request.

Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. RRF requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results.

Use the _search API with the retriever request body parameter, an rrf retriever, and any combination of the standard or knn retrievers. You can specify multiple queries using multiple standard or knn retrievers along with an rrf retriever.

( As an alternative to RRF, use the _search API with the query and knn request body parameters—without retrievers—to combine vector and text search. Use the boost parameter to manage the weight of each query type. This is known as linear combination. )

Documentation:

Security and data privacy

edit

Whether implementing an internal knowledge base or integrating with an external LLM service, you may be concerned about the privacy and access of your private application data.

Use Elastic’s security features to manage which people and systems have access.

Use role-based access control, or rely on document- or field-level security for more granular controls.

Role-based access control (RBAC)

edit

Role-based access control enables you to authorize users by assigning privileges to roles and assigning roles to users or groups.

You can use built-in roles or define your own roles using _security APIs.

Documentation:

Document and field level security

edit

Document level security restricts the documents that users have read access to, while field level security restricts the fields that users have read access to. In particular, these solutions restrict which documents or fields can be accessed from document-based read APIs.

Implement document and field level security using the Elasticsearch _security APIs.

Documentation:

Application development tools

edit

Elastic also provides a variety of tools for general purpose application development and integrations.

Ingest data from a variety of sources, build a search experience using your preferred programming language, avoid query injection attacks, and ship and view analytics related to user behavior.

Also use these tools to integrate with third party services, including LangChain and OpenAI or other large language models (LLMs).

Ingestion tools

edit

Use Elastic ingestion tools to index and synchronize data from various sources, including applications, databases, web pages, and content services.

Documentation:

Or implement your own integrations using Elasticsearch’s Index and Document APIs.

Documentation:

Language clients

edit

Language clients provide Elasticsearch APIs in various programming languages, packaged as libraries.

Add the relevant library to your application to build custom integrations in your preferred programming language.

Documentation: Elasticsearch clients

Search UI

edit

Elastic Search UI provides state management and components for React applications. Use Search UI to quickly prototype a search experience or build a production-quality UI.

Search UI relies on various "connector" libraries to interface with Elasticsearch and other search engines. Use the Elasticsearch connector for the greatest compatibility with Elasticsearch queries, including semantic search and vector search.

Documentation:

Behavioral analytics

edit

Behavioral analytics is a general purpose analytics platform to analyze user behavior. Send event data, such as search queries and clicks, to Elasticsearch.

Use default dashboards to analyze these events, or create your own visualizations. Use this analysis to improve your search relevance and other application functions.

Documentation: Behavioral analytics

Search applications

edit

A search application is an Elasticsearch endpoint that corresponds to one or more indices and restricts queries to predefined templates.

Use a search application with an untrusted client, like a web application, where you may be exposed to query injection attacks or other abuses.

Documentation: Search Applications