Elastic Search 8.14: Faster and more cost-effective vector search, improved relevance with retrievers and reranking, RAG and developer tooling

Search-release-header-image.jpg

We're committed to pushing the boundaries of search development and focusing on empowering search builders with powerful tools. With our latest updates, Elastic becomes even more potent for customers dealing with vast amounts of data represented as vectors. These enhancements promise faster speeds, reduced storage costs, and seamless integration between software and hardware. 

Elastic Search 8.14 is available now on Elastic Cloud — the only hosted Elasticsearch offering to include all of the new features in this latest release. You can also download the Elastic Stack and our cloud orchestration products — Elastic Cloud Enterprise and Elastic Cloud for Kubernetes — for a self-managed experience. 

What else is new in Elastic 8.14? Check out the 8.14 announcement post to learn more.

Democratizing search relevance with retrievers and reranking

Retrievers and reranking play crucial roles in improving the relevance and accuracy of search results. Our updates are impactful to both vector search users and those leveraging more traditional models, such as BM25.

We add the retrievers abstraction to the _search API for returning top hits using standard, knn, or rrf methods. This allows users to more easily build sophisticated multi-stage retrieval without pipeline complexities.

For example, to use KNN and BM25 retrieval methods together, it is no longer necessary to define stages using a pipeline in order to perform a KNN search, retrieve the IDs of the results, and then perform a BM25 search on said IDs. Instead, the retriever tree can be built directly into the search query:

GET index/_search
{
    "retriever": {
        "rrf": {
            "window_size": 100,
            "retrievers": [
                {
                    "knn": {
                        "field": "vector",
                        "k": 3,
                        "num_candidates": 10,
                        "query_vector": [1, 2, 3]
                    }
                },
                {
                    "standard": {
                        "query": {
                            "match": {
                                "message": {
                                    "query": "{{query_string}}"    
                                }
                            }
                        }
                    }
                }
            ]
        }
    },
    "size": 5,
    "fields": ["message"]
}

Reranking retrieved documents can further improve relevance by returning the relevancy ranks of documents in relation to the search query. Reranking effectively delivers semantic search to all users: RAG systems will be able to rely on the most relevant top results for context, and traditional search, such as BM25, will be able to surface the most relevant results to the top. 

Elastic is the only vector database to support the Cohere Rerank 3 model and makes reranking using this model seamless through our _inference API without requiring complex multiple queries or reindexing documents. To rerank retrieved results using the Cohere model, begin by configuring the inference endpoint:

PUT _inference/rerank/cohere_rerank 
   {
       "service": "cohere",
       "service_settings": {
           "api_key": <API-KEY>, 
           "model_id": "rerank-english-v3.0"
       },
       "task_settings": {
           "top_n": 10,
           "return_documents": true
       }
   }

Once the inference endpoint is specified, use it to rerank results by passing in the original query used for retrieval along with the documents that were retrieved by a search.

POST _inference/rerank/cohere_rerank
   {
     "input": [{{query_results}}], 
     "query": "{{query_string}}" 
   }

Elevating the RAG experience

Our latest tools and enhancements are designed to elevate the RAG experience. Both the Playground and Dev Console with Jupyter notebooks empower users to experiment, refine, and iterate quickly. 

In Playground, developers can select multiple indices that are ingested from multiple third-party data sources to experiment with and refine semantic text queries, export generated code, and ultimately design a conversational search experience. This simplifies RAG implementation and allows for quick prototyping of chat experiences with Elasticsearch data to ground the LLM responses.

1 - playground

The Embeddable Dev Console is now available everywhere in Kibana to jumpstart query development with pre-populated, contextual code snippets as well as Jupyter notebooks.

2 - embeddable dev console
3 - jupyter

We added support for getting embeddings from Azure OpenAI, unlocking advanced AI capabilities and enriching the functionality and insights of RAGs. OpenAI completion task is now available in the inference processor, streamlining the workflow for generating intelligent responses and enhancing the overall efficiency of RAG interactions.

Tooling enhancements for more effective data handling

The ability to load data into and handle data in Elastic efficiently is crucial for maintaining effective search applications. These enhancements allow users to tailor services to their specific needs and streamline development and operational processes:

Try it out

Read about these capabilities and more in the release notes.

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on cloud? Start a free trial.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.