What is Elasticsearch Relevance Engine (ESRE)?
editWhat is Elasticsearch Relevance Engine (ESRE)?
editSoftware developers are increasingly using machine learning models to improve the relevance of data presented to their users. This is particularly true for applications using natural language interfaces, for example: search, question/answer, completion, and chat.
The Elasticsearch Relevance Engine (ESRE) is a collection of tools from Elastic that combines machine learning models, data transformation and storage (including vectors), and data search and retrieval. ESRE also includes tools for data security, and tools to integrate with other software, including various data sources and large language models (LLMs).
Read on to learn about the components of ESRE, or jump directly to Examples for example applications and implementations.
Machine Learning models
editMachine learning models enable your applications to understand natural language data and enrich or transform that data (at index time and at query time).
Uses of machine learning models include:
- Generating vector embeddings
- Extracting information from unstructured text, such as named entities or the answers to questions
- Classifying text, such as its language or its sentiment (positive or negative)
To perform these operations, you must deploy one or more trained models.
Elastic provides the following relevant features:
- The Elastic Sparse Encoder trained model for general purpose semantic search, without fine-tuning
- Interfaces to deploy and manage third party trained models for vector search and natural language processing
- Cloud infrastructure on which to deploy these models
Elastic Sparse Encoder model
editThe Elastic Sparse Encoder model is a machine learning model, built and trained by Elastic, which enables general-purpose semantic search for English language data.
At index time, the Elastic Sparse Encoder model enriches each document with an additional text expansion field that uses weighted tokens to capture the relationships between words and their meanings. At query time, when using a text expansion query, the sparse encoder model applies the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.
The Elastic Sparse Encoder model is a zero shot out of domain machine learning model, which means it does not require additional training or fine-tuning with your own data. Use this Elastic model to get started with semantic search without needing to identify and manage additional models.
Deploy the model, create a pipeline with an inference processor, and ingest (or re-index) your data through the pipeline.
Documentation:
3rd party model management
editMany public and private trained models are available to enrich your data, each solving a different problem. For example, Hugging Face catalogs thousands of available models.
To use a third party model with Elastic, you must import and deploy the model, and then create an ingest pipeline with an inference processor to perform data transformation.
Elastic provides the following interfaces to manage the trained models you are using:
- A Kibana UI at Machine Learning > Model Management > Trained Models
-
Elasticsearch APIs grouped under
/_ml/trained_models/
- The Eland language client, implemented in Python
Documentation: Deploy trained models
Elastic Cloud ML instances
editElastic Cloud includes infrastructure on which to deploy and run trained models. When creating an Elastic Cloud deployment, enable Machine Learning instances, and optionally enable Autoscaling.
Documentation: Set up machine learning features
Data storage
editElastic provides the capabilities to store data of various types, including unstructured text and dense vectors (embeddings). Use Elastic to store your data before and after transformation by a machine learning model.
Indices
editElastic stores data as documents (with fields) within indices, and supports many field types, including dense vectors. Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
Use the Index and Document APIs or the Kibana dev tools console to manage data manually. Some ingestion tools, such as the web crawler and connectors, manage indices and documents on your behalf.
Use the Reindex API to re-index data already stored in Elasticsearch, for example, to run the data through a machine learning ingest pipeline.
Documentation:
Vector field types
editTo store dense vectors for vector search, use the dense vector field type. At query time, use a kNN query to retrieve this information.
To store sparse vectors for text expansion (for example, when using the Elastic Sparse Encoder model), use the sparse vector field type. At query time, use a text expansion query to retrieve this information.
Documentation:
Data transformation
editElastic provides tools to transform your data, regardless of how it is stored (within Elastic or outside).
Ingest pipelines are general purpose transformation tools, while inference processors enable the use of machine learning models within these pipelines.
After deploying machine learning models, use these ingestion tools to apply ML transformations to your data as you index or re-index your documents. Extract text, classify documents, or create embeddings and store this data within additional fields.
Ingest pipelines
editAn ingest pipeline enables you to "pipe" incoming data through a series of processors that transform the data before storage. Use an ingest pipeline to enrich documents with additional fields, including fields generated by machine learning models. Use an inference processor to employ a trained model within your pipeline.
Documentation: Ingest pipelines
Inference processors
editAn inference processor is a pipeline task that uses a deployed, trained model to transform incoming data during indexing or re-indexing.
Documentation: Inference processor
Search and retrieval
editAfter using machine learning models to enrich your documents with additional fields or embeddings, choose from a variety of retrieval methods that take advantage of this additional data.
Use Elastic for semantic search with dense vectors (kNN) or sparse vectors (Elastic Sparse Encoder), and combine these results with those from BM25 text search, optionally boosting on additional NLP fields.
Perform any of these retrieval methods through the same API endpoint: _search
.
Text (BM25 + NLP)
editUse a full-text query to search documents enriched by machine learning models.
Elasticsearch provides a domain-specific language (DSL) for describing a full-text search query. Use this query DSL to design full-text queries, targeting the various fields of your documents.
Depending on your use case, use the additional fields you have added through natural language processing to improve the relevance of your results.
Use the _search
API with the query
request body parameter to specify a search query using Elasticsearch’s Query DSL.
For example, the match
query is the standard query for performing a full-text search, including options for fuzzy matching.
Documentation:
Sparse vector search using text expansion (Sparse Encoder)
editUse a sparse vector query to perform semantic search on documents enriched by the Elastic Sparse Encoder model.
At index time, the Elastic Sparse Encoder model enriches each document with text expansion using expanded weighted tokens to capture the relationships between words and their meanings. At query time, the sparse vector query uses the sparse encoder model to apply the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.
Use the _search
API with the query
and query.sparse_vector
request body parameters to query the sparse vector field using the sparse encoder model.
Documentation:
Vector (kNN)
editUse a k-nearest neighbor (kNN) search to retrieve documents containing indexed vectors, such as those added through an inference processor.
This type of search finds the k nearest vectors to a query vector, as measured by a similarity metric. You will receive the top n documents that are closest in meaning to the query, sorted by their proximity to the query.
Use the _search
API with the knn
request body parameter to specify the kNN query to run.
Documentation:
Hybrid (RRF)
editElasticsearch allows you to combine any of the above retrieval methods within a single search request.
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. RRF requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results.
Use the _search
API with the retriever
request body parameter, an rrf
retriever, and any combination of the standard
or knn
retrievers.
You can specify multiple queries using multiple standard
or knn
retrievers along with an rrf
retriever.
(
As an alternative to RRF, use the _search
API with the query
and knn
request body parameters—without retrievers
—to combine vector and text search.
Use the boost
parameter to manage the weight of each query type.
This is known as linear combination.
)
Documentation:
Security and data privacy
editWhether implementing an internal knowledge base or integrating with an external LLM service, you may be concerned about the privacy and access of your private application data.
Use Elastic’s security features to manage which people and systems have access.
Use role-based access control, or rely on document- or field-level security for more granular controls.
Role-based access control (RBAC)
editRole-based access control enables you to authorize users by assigning privileges to roles and assigning roles to users or groups.
You can use built-in roles or define your own roles using _security
APIs.
Documentation:
Document and field level security
editDocument level security restricts the documents that users have read access to, while field level security restricts the fields that users have read access to. In particular, these solutions restrict which documents or fields can be accessed from document-based read APIs.
Implement document and field level security using the Elasticsearch _security
APIs.
Documentation:
Application development tools
editElastic also provides a variety of tools for general purpose application development and integrations.
Ingest data from a variety of sources, build a search experience using your preferred programming language, avoid query injection attacks, and ship and view analytics related to user behavior.
Also use these tools to integrate with third party services, including LangChain and OpenAI or other large language models (LLMs).
Ingestion tools
editUse Elastic ingestion tools to index and synchronize data from various sources, including applications, databases, web pages, and content services.
Documentation:
Or implement your own integrations using Elasticsearch’s Index and Document APIs.
Documentation:
Language clients
editLanguage clients provide Elasticsearch APIs in various programming languages, packaged as libraries.
Add the relevant library to your application to build custom integrations in your preferred programming language.
Documentation: Elasticsearch clients
Search UI
editElastic Search UI provides state management and components for React applications. Use Search UI to quickly prototype a search experience or build a production-quality UI.
Search UI relies on various "connector" libraries to interface with Elasticsearch and other search engines. Use the Elasticsearch connector for the greatest compatibility with Elasticsearch queries, including semantic search and vector search.
Documentation:
Behavioral analytics
editBehavioral analytics is a general purpose analytics platform to analyze user behavior. Send event data, such as search queries and clicks, to Elasticsearch.
Use default dashboards to analyze these events, or create your own visualizations. Use this analysis to improve your search relevance and other application functions.
Documentation: Behavioral analytics
Search applications
editA search application is an Elasticsearch endpoint that corresponds to one or more indices and restricts queries to predefined templates.
Use a search application with an untrusted client, like a web application, where you may be exposed to query injection attacks or other abuses.
Documentation: Search Applications