Adding AI summaries to your site with Elastic

How to add an AI summary box along with the search results to enrich your search experience.

Generative AI Vector Database How To

By: Gustavo Llermaly

On September 26, 2024

Search as we currently know it (search bar, results, filters, pages, etc.) has come very far and fulfills many different functionalities. This is especially true when we know the keywords needed to find what we're looking for or when we know which documents contain the information we want. However, when the results are documents with long texts, we need an additional step besides reading and summarizing to get the final answer. So, to make this process easier companies such as Google and their Search Generative Experience (SGE) use AI to complement the search results via AI summaries.

What if I told you you could do the same with Elastic?

In this article, you will learn to create a React component that will display an AI summary answering the user questions along with the search results to help users, answering their questions faster. We will also ask the model to provide citations, so that answers are grounded to the search results.

The end results will look like this:

cover

You can find the full working example repository here.

Steps

Creating endpoints

Before creating the endpoints, take a look at the high level architecture for this project.

diagram

The recommended approach to consume Elasticsearch from a UI is to proxy the calls, so are going to spin up a backend the UI can connect for that purpose. You can read more about this approach here.

IMPORTANT : The approach outlined in this article provides a simple method for handling Elasticsearch queries and generating summaries. Consider your specific use case and requirements before implementing this solution. A more appropiate architecture would involve doing both search and completion under the same API call behind the proxy.

Embeddings endpoint

To enable semantic search, we are going to use the ELSER model to help us not only to find by words matching but also by semantic meaning.

You can use the Kibana UI to create the ELSER endpoint:

Or via the _inference API:

PUT _inference/sparse_embedding/elser-embeddings
{
  "service": "elser",
  "service_settings": {
    "model_id": ".elser_model_2",
    "num_allocations": 1,
    "num_threads": 1
  }
}

Completion endpoint

To generate the AI Summary we must send the relevant documents as context and the user query to a model. To do this we create a completion endpoint connecting to OpenAI. You can also choose between a growing list of different providers if you don't want to work with OpenAI.

PUT _inference/completion/summaries-completion
{
  "service": "openai",
  "service_settings": {
    "api_key": "<API_KEY>",
    "model_id": "gpt-4o-mini"
  }
}

Every time a user runs a search we are going the call the model, so we need speed and cost efficiency, making this a good opportunity to test the new gpt-4o-mini.

Indexing data

Since we are adding a search experience to our website, we can use Elastic web crawler to index our website content and test with our own documents. For this example I'm going to use Elastic Labs Blog.

To create the crawler, follow the instructions on the docs.

For this example, we will use the following settings:

Note: I'm adding some extraction rules to clean the field values. I'm also using the semantic_text field from the crawler, and associating it to the article_content field

A brief explanation of the extracted fields:

meta_img: The article's image used as the thumbnail. meta_author: The author's name, enabling filtering by author. article_content: We index only the main content of the article within the div, excluding unrelated data such as headers and footers. This optimization enhances search relevance and reduces costs by generating shorter embeddings.

This is how a document will look after applying the rules and executing a crawl successfully:

{
    "_index": "search-labs-index",
    "_id": "66a5568a30cc8eb607eec315",
    "_version": 1,
    "_seq_no": 6,
    "_primary_term": 3,
    "found": true,
    "_source": {
      "last_crawled_at": "2024-07-27T20:20:25Z",
      "url_path_dir3": "langchain-collaboration",
      "meta_img": "https://www.elastic.co/search-labs/assets/images/langchain-partner-blog.png?5c6faef66d5699625c50453e356927d0",
      "semantic_text": {
        "inference": {
          "inference_id": "elser_model_2",
          "model_settings": {
            "task_type": "sparse_embedding"
          },
          "chunks": [
            {
              "text": """Tutorials Integrations Blog Start Free Trial Contact Sales Open navigation menu Blog / Generative AI LangChain and Elastic collaborate to add vector database and semantic reranking for RAG In the last year, we have seen a lot of movement in generative AI. Many new services and libraries have emerged. LangChain has separated itself as the most popular library for building applications with large language models (LLMs), for example Retrieval Augmented Generation (RAG) systems. The library makes it really easy to prototype and experiment with different models and retrieval systems. To enable the first-class support for Elasticsearch in LangChain, we recently elevated our integration from a community package to an official LangChain partner package . This work makes it straightforward to import Elasticsearch capabilities into LangChain applications. The Elastic team manages the code and the release process through a dedicated repository . We will keep improving the LangChain integration there, making sure that users can take full advantage of the latest improvements in Elasticsearch. Our collaboration with Elastic in the last 12 months has been exceptional, particularly as we establish better ways for developers and end users to build RAG applications from prototype to production," said Harrison Chase, Co-Founder and CEO at LangChain. "The LangChain-Elasticsearch vector database integrations will help do just that, and we're excited to see this partnership grow with future feature and integration releases. Elasticsearch is one of the most flexible and performant retrieval systems that includes a vector database. One of our goals at Elastic is to also be the most open retrieval system out there. In a space as fast-moving as generative AI, we want to have the developer's back when it comes to utilizing emerging tools and libraries. This is why we work closely with libraries like LangChain and add native support to the GenAI ecosystem. From using Elasticsearch as a vector database to hybrid search and orchestrating a full RAG application. Elasticsearch and LangChain have collaborated closely this year. We are putting our extensive experience in building search tools into making your experience of LangChain easier and more flexible. Let's take a deeper look in this blog. Rapid RAG prototyping RAG is a technique for providing users with highly relevant answers to questions. The main advantages over using LLMs directly are that user data can be easily integrated, and hallucinations by the LLM can be minimized. This is achieved by adding a document retrieval step that provides relevant context for the""",
              "embeddings": {
                "rag": 2.2831416,
                "elastic": 2.1994505,
                "genera": 1.990228,
                "lang": 1.9417559,
                "vector": 1.7541072,
                "##ai": 1.5763651,
                "integration": 1.5619806,
                "##sea": 1.5154194,
                "##rank": 1.4946039,
                "retrieval": 1.3957807,
                "ll": 1.362704 
                // more embeddings ...
              }
            }
          ]
        }
      },
      "additional_urls": [
        "https://www.elastic.co/search-labs/blog/langchain-collaboration"
      ],
      "body_content": """Tutorials Integrations Blog Start Free Trial Contact Sales Open navigation menu Blog / Generative AI LangChain and Elastic collaborate to add vector database and semantic reranking for RAG In the last year, we have seen a lot of movement in generative AI. Many new services and libraries have emerged. LangChain has separated itself as the most popular library for building applications with large language models (LLMs), for example Retrieval Augmented Generation (RAG) systems. The library makes it really easy to prototype and experiment with different models and retrieval systems. To enable the first-class support for Elasticsearch in LangChain, we recently elevated our integration from a community package to an official LangChain partner package . This work makes it straightforward to import Elasticsearch capabilities into LangChain applications. The Elastic team manages the code and the release process through a dedicated repository . We will keep improving the LangChain integration there, making sure that users can take full advantage of the latest improvements in Elasticsearch. Our collaboration with Elastic in the last 12 months has been exceptional, particularly as we establish better ways for developers and end users to build RAG applications from prototype to production," said Harrison Chase, Co-Founder and CEO at LangChain. "The LangChain-Elasticsearch vector database integrations will help do just that, and we're excited to see this partnership grow with future feature and integration releases. Elasticsearch is one of the most flexible and performant retrieval systems that includes a vector database. One of our goals at Elastic is to also be the most open retrieval system out there. In a space as fast-moving as generative AI, we want to have the developer's back when it comes to utilizing emerging tools and libraries. This is why we work closely with libraries like LangChain and add native support to the GenAI ecosystem. From using Elasticsearch as a vector database to hybrid search and orchestrating a full RAG application. Elasticsearch and LangChain have collaborated closely this year. We are putting our extensive experience in building search tools into making your experience of LangChain easier and more flexible. Let's take a deeper look in this blog. Rapid RAG prototyping RAG is a technique for providing users with highly relevant answers to questions. The main advantages over using LLMs directly are that user data can be easily integrated, and hallucinations by the LLM can be minimized. This is achieved by adding a document retrieval step that provides relevant context for the LLM. Since its inception, Elasticsearch has been the go-to solution for relevant document retrieval and has since been a leading innovator, offering numerous retrieval strategies. When it comes to integrating Elasticsearch into LangChain, we have made it easy to choose between the most common retrieval strategies, for example, dense vector, sparse vector, keyword or hybrid. And we enabled power users to further customize these strategies. Keep reading to see some examples. (Note that we assume we have an Elasticsearch deployment .) LangChain integration package In order to use the langchain-elasticsearch partner package, you first need to install it: pip install langchain-elasticsearch Then you can import the classes you need from the langchain_elasticsearch module, for example, the ElasticsearchStore , which gives you simple methods to index and search your data. In this example, we use Elastic's sparse vector model ELSER (which has to be deployed first) as our retrieval strategy. from langchain_elasticsearch import ElasticsearchStore es_store = ElasticsearchStore( es_cloud_id="your-cloud-id", es_api_key="your-api-key", index_name="rag-example", strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(model_id=".elser_model_2"), ), A simple RAG application Now, let's build a simple RAG example application. First, we add some example documents to our Elasticsearch store. texts = [ "LangChain is a framework for developing applications powered by large language models (LLMs).", "Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.", ... ] es_store.add_texts(texts) Next, we define the LLM. Here, we use the default gpt-3.5-turbo model offered by OpenAI, which also powers ChatGPT. from langchain_openai import ChatOpenAI llm = ChatOpenAI(api_key="sk-...") # or set the OPENAI_API_KEY environment variable Now we are ready to plug together our RAG system. For simplicity we take a standard prompt for instructing the LLM. We also transform the Elasticsearch store into a LangChain retriever. Finally, we chain together the retrieval step with adding the documents to the prompt and sending it to the LLM. from langchain import hub from langchain_core.runnables import RunnablePassthrough prompt = hub.pull("rlm/rag-prompt") # standard prompt from LangChain hub retriever = es_store.as_retriever() def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) With these few lines of code, we now already have a simple RAG system. Users can now ask questions on the data: rag_chain.invoke("Which frameworks can help me build LLM apps?") "LangChain is a framework specifically designed for building LLM-powered applications. ..." It's as simple as this. Our RAG system can now respond with info about LangChain, which ChatGPT (version 3.5) cannot. Of course there are many ways to improve this system. One of them is optimizing the way we retrieve the documents. Full retrieval flexibility through the Retriever The Elasticsearch store offers common retrieval strategies out-of-the-box, and developers can freely experiment with what works best for a given use case. But what if your data model is more complex than just text with a single field? What, for example, if your indexing setup includes a web crawler that yields documents with texts, titles, URLs and tags and all these fields are important for search? Elasticsearch's Query DSL gives users full control over how to search their data. And in LangChain, the ElasticsearchRetriever enables this full flexibility directly. All that is required is to define a function that maps the user input query to an Elasticsearch request. Let's say we want to add semantic reranking capabilities to our retrieval step. By adding a Cohere reranking step, the results at the top become more relevant without extra manual tuning. For this, we define a Retriever that takes in a function that returns the respective Query DSL structure. def text_similarity_reranking(search_query: str) -> Dict: return { "retriever": { "text_similarity_reranker": { "retriever": { "standard": { "query": { "match": { "text_field": search_query } } } }, "field": "text_field", "inference_id": "cohere-rerank-service", "inference_text": search_query, "window_size": 10 } } } retriever = ElasticsearchRetriever.from_es_params( es_cloud_id="your-cloud-id", es_api_key="your-api-key", index_name="rag-example", content_field=text_field, body_func=text_similarity_reranking, ) (Note that the query structure for similarity reranking is still being finalized. It will be available in an upcoming release.) This retriever can slot seamlessly into the RAG code above. The result is that the retrieval part of our RAG pipeline is much more accurate, leading to more relevant documents being forwarded to the LLM and, most importantly, to more relevant answers. Conclusion Elastic's continued investment into LangChain's ecosystem brings the latest retrieval innovations to one of the most popular GenAI libraries. Through this collaboration, Elastic and LangChain enable developers to rapidly and easily build RAG solutions for end users while providing the necessary flexibility for in-depth tuning of results quality. Ready to try this out on your own? Start a free trial . Looking to build RAG into your apps? Want to try different LLMs with a vector database? Check out our sample notebooks for LangChain, Cohere and more on Github, and join Elasticsearch Relevance Engine training now. Max Jakob 5 min read 11 June 2024 Generative AI Integrations Share Twitter Facebook LinkedIn Recommended Articles Integrations How To Generative AI • 25 July 2024 Protecting Sensitive and PII information in RAG with Elasticsearch and LlamaIndex How to protect sensitive and PII data in a RAG application with Elasticsearch and LlamaIndex. Srikanth Manvi How To Generative AI • 19 July 2024 Build a Conversational Search for your Customer Success Application with Elasticsearch and OpenAI Explore how to enhance your customer success application by implementing a conversational search feature using advanced technologies such as Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) Lionel Palacin Integrations How To Generative AI Vector Database • 11 July 2024 semantic_text with Amazon Bedrock Using semantic_text new feature, and AWS Bedrock as inference endpoint service Gustavo Llermaly Integrations How To Generative AI Vector Database • 10 July 2024 Elasticsearch open inference API adds Amazon Bedrock support Elasticsearch open inference API adds support for embeddings generated from models hosted on Amazon Bedrock." Mark Hoy Hemant Malik Vector Database How To Generative AI • 10 July 2024 Playground: Experiment with RAG using Bedrock Anthropic Models and Elasticsearch in minutes Playground is a low code interface for developers to explore grounding LLMs of their choice with their own private data, in minutes. Joe McElroy Aditya Tripathi Max Jakob 5 min read 11 June 2024 Generative AI Integrations Share Twitter Facebook LinkedIn Jump to Rapid RAG prototyping LangChain integration package A simple RAG application Full retrieval flexibility through the Retriever Conclusion Sitemap RSS Feed Search Labs Repo Elastic.co ©2024. Elasticsearch B.V. All Rights Reserved.""",
      "article_content": """In the last year, we have seen a lot of movement in generative AI. Many new services and libraries have emerged. LangChain has separated itself as the most popular library for building applications with large language models (LLMs), for example Retrieval Augmented Generation (RAG) systems. The library makes it really easy to prototype and experiment with different models and retrieval systems. To enable the first-class support for Elasticsearch in LangChain, we recently elevated our integration from a community package to an official LangChain partner package . This work makes it straightforward to import Elasticsearch capabilities into LangChain applications. The Elastic team manages the code and the release process through a dedicated repository . We will keep improving the LangChain integration there, making sure that users can take full advantage of the latest improvements in Elasticsearch. Our collaboration with Elastic in the last 12 months has been exceptional, particularly as we establish better ways for developers and end users to build RAG applications from prototype to production," said Harrison Chase, Co-Founder and CEO at LangChain. "The LangChain-Elasticsearch vector database integrations will help do just that, and we're excited to see this partnership grow with future feature and integration releases. Elasticsearch is one of the most flexible and performant retrieval systems that includes a vector database. One of our goals at Elastic is to also be the most open retrieval system out there. In a space as fast-moving as generative AI, we want to have the developer's back when it comes to utilizing emerging tools and libraries. This is why we work closely with libraries like LangChain and add native support to the GenAI ecosystem. From using Elasticsearch as a vector database to hybrid search and orchestrating a full RAG application. Elasticsearch and LangChain have collaborated closely this year. We are putting our extensive experience in building search tools into making your experience of LangChain easier and more flexible. Let's take a deeper look in this blog. Rapid RAG prototyping RAG is a technique for providing users with highly relevant answers to questions. The main advantages over using LLMs directly are that user data can be easily integrated, and hallucinations by the LLM can be minimized. This is achieved by adding a document retrieval step that provides relevant context for the LLM. Since its inception, Elasticsearch has been the go-to solution for relevant document retrieval and has since been a leading innovator, offering numerous retrieval strategies. When it comes to integrating Elasticsearch into LangChain, we have made it easy to choose between the most common retrieval strategies, for example, dense vector, sparse vector, keyword or hybrid. And we enabled power users to further customize these strategies. Keep reading to see some examples. (Note that we assume we have an Elasticsearch deployment .) LangChain integration package In order to use the langchain-elasticsearch partner package, you first need to install it: pip install langchain-elasticsearch Then you can import the classes you need from the langchain_elasticsearch module, for example, the ElasticsearchStore , which gives you simple methods to index and search your data. In this example, we use Elastic's sparse vector model ELSER (which has to be deployed first) as our retrieval strategy. from langchain_elasticsearch import ElasticsearchStore es_store = ElasticsearchStore( es_cloud_id="your-cloud-id", es_api_key="your-api-key", index_name="rag-example", strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(model_id=".elser_model_2"), ), A simple RAG application Now, let's build a simple RAG example application. First, we add some example documents to our Elasticsearch store. texts = [ "LangChain is a framework for developing applications powered by large language models (LLMs).", "Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.", ... ] es_store.add_texts(texts) Next, we define the LLM. Here, we use the default gpt-3.5-turbo model offered by OpenAI, which also powers ChatGPT. from langchain_openai import ChatOpenAI llm = ChatOpenAI(api_key="sk-...") # or set the OPENAI_API_KEY environment variable Now we are ready to plug together our RAG system. For simplicity we take a standard prompt for instructing the LLM. We also transform the Elasticsearch store into a LangChain retriever. Finally, we chain together the retrieval step with adding the documents to the prompt and sending it to the LLM. from langchain import hub from langchain_core.runnables import RunnablePassthrough prompt = hub.pull("rlm/rag-prompt") # standard prompt from LangChain hub retriever = es_store.as_retriever() def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) With these few lines of code, we now already have a simple RAG system. Users can now ask questions on the data: rag_chain.invoke("Which frameworks can help me build LLM apps?") "LangChain is a framework specifically designed for building LLM-powered applications. ..." It's as simple as this. Our RAG system can now respond with info about LangChain, which ChatGPT (version 3.5) cannot. Of course there are many ways to improve this system. One of them is optimizing the way we retrieve the documents. Full retrieval flexibility through the Retriever The Elasticsearch store offers common retrieval strategies out-of-the-box, and developers can freely experiment with what works best for a given use case. But what if your data model is more complex than just text with a single field? What, for example, if your indexing setup includes a web crawler that yields documents with texts, titles, URLs and tags and all these fields are important for search? Elasticsearch's Query DSL gives users full control over how to search their data. And in LangChain, the ElasticsearchRetriever enables this full flexibility directly. All that is required is to define a function that maps the user input query to an Elasticsearch request. Let's say we want to add semantic reranking capabilities to our retrieval step. By adding a Cohere reranking step, the results at the top become more relevant without extra manual tuning. For this, we define a Retriever that takes in a function that returns the respective Query DSL structure. def text_similarity_reranking(search_query: str) -> Dict: return { "retriever": { "text_similarity_reranker": { "retriever": { "standard": { "query": { "match": { "text_field": search_query } } } }, "field": "text_field", "inference_id": "cohere-rerank-service", "inference_text": search_query, "window_size": 10 } } } retriever = ElasticsearchRetriever.from_es_params( es_cloud_id="your-cloud-id", es_api_key="your-api-key", index_name="rag-example", content_field=text_field, body_func=text_similarity_reranking, ) (Note that the query structure for similarity reranking is still being finalized. It will be available in an upcoming release.) This retriever can slot seamlessly into the RAG code above. The result is that the retrieval part of our RAG pipeline is much more accurate, leading to more relevant documents being forwarded to the LLM and, most importantly, to more relevant answers. Conclusion Elastic's continued investment into LangChain's ecosystem brings the latest retrieval innovations to one of the most popular GenAI libraries. Through this collaboration, Elastic and LangChain enable developers to rapidly and easily build RAG solutions for end users while providing the necessary flexibility for in-depth tuning of results quality.""",
      "domains": [
        "https://www.elastic.co"
      ],
      "title": "LangChain and Elastic collaborate to add vector database and semantic reranking for RAG — Search Labs",
      "meta_author": [
        "Max Jakob"
      ],
      "url": "https://www.elastic.co/search-labs/blog/langchain-collaboration",
      "url_scheme": "https",
      "meta_description": "Learn how LangChain and Elasticsearch can accelerate your speed of innovation in the LLM and GenAI space.",
      "headings": [
        "LangChain and Elastic collaborate to add vector database and semantic reranking for RAG",
        "Rapid RAG prototyping",
        "LangChain integration package",
        "A simple RAG application",
        "Full retrieval flexibility through the Retriever",
        "Conclusion",
        "Protecting Sensitive and PII information in RAG with Elasticsearch and LlamaIndex",
        "Build a Conversational Search for your Customer Success Application with Elasticsearch and OpenAI",
        "semantic_text with Amazon Bedrock",
        "Elasticsearch open inference API adds Amazon Bedrock support",
        "Playground: Experiment with RAG using Bedrock Anthropic Models and Elasticsearch in minutes"
      ],
      "links": [
        "https://cloud.elastic.co/registration?onboarding_token=search&cta=cloud-registration&tech=trial&plcmt=navigation&pg=search-labs",
        "https://discuss.elastic.co/c/search/84",
        "https://github.com/elastic/elasticsearch-labs",
        "https://github.com/langchain-ai/langchain-elastic",
        "https://pypi.org/project/langchain-elasticsearch/",
        "https://python.langchain.com/v0.2/docs/integrations/providers/elasticsearch/",
        "https://search.elastic.co/?location%5B0%5D=Search+Labs&referrer=https://www.elastic.co/search-labs/blog/langchain-collaboration",
        "https://www.elastic.co/contact",
        "https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html",
        "https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html#download-deploy-elser",
        "https://www.elastic.co/search-labs",
        "https://www.elastic.co/search-labs/blog",
        "https://www.elastic.co/search-labs/blog",
        "https://www.elastic.co/search-labs/blog/category/generative-ai",
        "https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank",
        "https://www.elastic.co/search-labs/blog/langchain-collaboration#a-simple-rag-application",
        "https://www.elastic.co/search-labs/blog/langchain-collaboration#conclusion",
        "https://www.elastic.co/search-labs/blog/langchain-collaboration#full-retrieval-flexibility-through-the-retriever",
        "https://www.elastic.co/search-labs/blog/langchain-collaboration#langchain-integration-package",
        "https://www.elastic.co/search-labs/blog/langchain-collaboration#rapid-rag-prototyping",
        "https://www.elastic.co/search-labs/blog/retrieval-augmented-generation-rag",
        "https://www.elastic.co/search-labs/blog/semantic-reranking-with-retrievers",
        "https://www.elastic.co/search-labs/integrations",
        "https://www.elastic.co/search-labs/tutorials",
        "https://www.elastic.co/search-labs/tutorials/install-elasticsearch"
      ],
      "id": "66a5568a30cc8eb607eec315",
      "url_port": 443,
      "url_host": "www.elastic.co",
      "url_path_dir2": "blog",
      "url_path": "/search-labs/blog/langchain-collaboration",
      "url_path_dir1": "search-labs"
    }
  }

Creating the proxy

To set up the proxy server, we’ll be using express.js. We'll create two endpoints following best practices: one for handling _search calls and another for completion calls.

Begin by creating a new directory called es-proxy, navigate into it using cd es-proxy, and initialize your project with npm init.

Next, install the necessary dependencies with the following command:

yarn add express axios dotenv cors

Here’s a brief explanation of each package:

express: Used to create the proxy server that will handle incoming requests and forward them to Elasticsearch. axios: A popular HTTP client that simplifies making requests to the Elasticsearch API. dotenv: Allows you to manage sensitive data, such as API keys, by storing them in environment variables. cors: Enables your UI to make requests to a different domain (in this case, your proxy server) by handling Cross-Origin Resource Sharing (CORS). This is essential for avoiding issues when your frontend and backend are hosted on different domains or ports.

Now, create an .env file to securely store your Elasticsearch URL and API key:

ELASTICSEARCH_URL=https://<your_elasticsearch_url>
API_KEY=<your_api_key>

Make sure the API Key you create is restricted to the needed index and it is read-only

Finally, create an index.js file with the following content:

require("dotenv").config();

const express = require("express");
const cors = require("cors");
const app = express();
const axios = require("axios");

app.use(express.json());
app.use(cors());

const { ELASTICSEARCH_URL, API_KEY } = process.env;

// Handle all _search requests
app.post("/api/:index/_search", async (req, res) => {
  try {
    const response = await axios.post(
      `${ELASTICSEARCH_URL}/${req.params.index}/_search`,
      req.body,
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: `ApiKey ${API_KEY}`,
        },
      }
    );
    res.json(response.data);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

// Handle all _completion requests
app.post("/api/completion", async (req, res) => {
  try {
    const response = await axios.post(
      `${ELASTICSEARCH_URL}/_inference/completion/summaries-completion`,
      req.body,
      {
        headers: {
          "Content-Type": "application/json",
          Authorization: `ApiKey ${API_KEY}`,
        },
      }
    );
    res.json(response.data);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

// Start the server
const PORT = process.env.PORT || 1337;
app.listen(PORT, () => {
  console.log(`Server is running on port ${PORT}`);
});

Now, start the server by running node index.js. This will launch the server on port 1337 by default, or on the port you define in your .env file.

Creating the component

For the UI component, we are going to use the Search UI React library search-ui. We will create a custom component so that every time a user runs a search, it will send the top results to the LLM using the completion inference endpoint we created, and then display the answer back to the user.

There is a full tutorial about configuring your instance that you can find here. You can run search-ui in your computer, or work with our live sandbox here.

Once you have the example running and connected to your data, run the following installation step in your terminal within the starter app folder: yarn add axios antd html-react-parser

After installing that additional dependencies, create a new AiSummary.js file for the new component. This will include a simple prompt to give to the AI the instructions and rules.

import { withSearch } from "@elastic/react-search-ui";
import { useState, useEffect } from "react";
import axios from "axios";
import { Card } from "antd";
import parse from "html-react-parser";

const formatSearchResults = (results) => {
  return results
    .slice(0, 3)
    .map(
      (result) => `
    Article Author(s): ${result.meta_author.raw.join(",")}
    Article URL: ${result.url.raw}
    Article title: ${result.title.raw}
    Article content: ${result.article_content.raw}
  `
    )
    .join("\n");
};

const fetchAiSummary = async (searchTerm, results) => {
  const prompt = `
    You are a search assistant. Your mission is to complement search results with an AI Summary to address the user request.
    User request: ${searchTerm}
    Top search results: ${formatSearchResults(results)}
    Rules:
    - The answer must be short. No more than one paragraph.
    - Use HTML
    - Use content from the most relevant search results only to answer the user request
    - Add highlights wrapping in <i><b></b></i> tags the most important phrases of your answer
    - At the end of the answer add a citations section with links to the articles you got the answer on this format:
    <h4>Citations</h4>
    <ul>
      <li><a href="{url}"> {title} </a></li>
    </ul>
    - Only provide citations from the top search results I showed you, and only if they are relevant to the user request.
  `;
  const responseData = await axios.post(
    "http://localhost:1337/api/completion",
    { input: prompt },
    {
      headers: {
        "Content-Type": "application/json",
      },
    }
  );
  return responseData.data.completion[0].result;
};

const AiSummary = ({ results, searchTerm, resultSearchTerm }) => {
  const [aiSummary, setAiSummary] = useState("");
  const [isLoading, setIsLoading] = useState(false);

  useEffect(() => {
    if (searchTerm) {
      setIsLoading(true);
      fetchAiSummary(searchTerm, results).then((summary) => {
        setAiSummary(summary);
        setIsLoading(false);
      });
    }
  }, [resultSearchTerm]);

  return (
    <Card style={{ width: "100%" }} loading={isLoading}>
      <div>
        <h2>AI Summary</h2>
        {!resultSearchTerm ? "Ask anything!" : parse(aiSummary)}
      </div>
    </Card>
  );
};

export default withSearch(({ results, searchTerm, resultSearchTerm }) => ({
  results,
  searchTerm,
  resultSearchTerm,
  AiSummary,
}))(AiSummary);

Updating App.js

Now that we created our custom component, it's time to add it to the application. This is how your App.js should look like:

import React from "react";
import ElasticsearchAPIConnector from "@elastic/search-ui-elasticsearch-connector";
import {
  ErrorBoundary,
  SearchProvider,
  SearchBox,
  Results,
  Facet,
} from "@elastic/react-search-ui";
import { Layout } from "@elastic/react-search-ui-views";
import "@elastic/react-search-ui-views/lib/styles/styles.css";
import AiSummary from "./AiSummary";

const connector = new ElasticsearchAPIConnector(
  {
    host: "http://localhost:1337/api",
    index: "search-labs-index",
  },
  (requestBody, requestState) => {
    if (!requestState.searchTerm) return requestBody;
    requestBody.query = {
      semantic: {
        query: requestState.searchTerm,
        field: "semantic_text",
      },
    };
    return requestBody;
  }
);

const config = {
  debug: true,
  searchQuery: {
    search_fields: {
      semantic_text: {},
    },
    result_fields: {
      title: {
        snippet: {},
      },
      article_content: {
        snippet: {
          size: 10,
        },
      },
      meta_description: {},
      url: {},
      meta_author: {},
      meta_img: {},
    },
    facets: {
      "meta_author.enum": { type: "value" },
    },
  },
  apiConnector: connector,
  alwaysSearchOnInitialLoad: false,
};

export default function App() {
  return (
    <SearchProvider config={config}>
      <div className="App">
        <ErrorBoundary>
          <Layout
            header={<SearchBox />}
            bodyHeader={<AiSummary />}
            bodyContent={
              <Results
                titleField="title"
                thumbnailField="meta_img"
                urlField="url"
              />
            }
            sideContent={
              <Facet key={"1"} field={"meta_author.enum"} label={"author"} />
            }
          />
        </ErrorBoundary>
      </div>
    </SearchProvider>
  );
}

Note how in the connector instantiation we have overridden the default query to use a semantic query and leverage the semantic_text mappings we created.

  (requestBody, requestState) => {
    if (!requestState.searchTerm) return requestBody;
    requestBody.query = {
      semantic: {
        query: requestState.searchTerm,
        field: "semantic_text",
      },
    };
    return requestBody;
  }

Asking questions

Now it's time to test it. Ask any question about the documents you indexed, and above the search results, you should see a card with the AI Summary:

Conclusion

Re-designing your search experience is very important to keep your users engaged, and save them the time of going through the results to find the answers to their questions. With the Elastic open inference service, and search-ui is easier than ever to design this kind of experiences. Are you ready to try?

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Report an issue

Related content

How to increase primary shard count in Elasticsearch

April 17, 2025

How to increase primary shard count in Elasticsearch

Exploring methods for increasing primary shard count in Elasticsearch.

By: Kofi Bartlett

Using LlamaIndex Workflows with Elasticsearch

Integrations Python+1

April 21, 2025

Using LlamaIndex Workflows with Elasticsearch

Learn how to create an Elasticsearch-based step for your LlamaIndex workflow.

By: Jeffrey Rengifo

How to migrate data between different versions of Elasticsearch & between clusters

April 14, 2025

How to migrate data between different versions of Elasticsearch & between clusters

Exploring methods for transferring data between Elasticsearch versions and clusters.

By: Kofi Bartlett

Elasticsearch heap size usage and JVM garbage collection

April 22, 2025

Elasticsearch heap size usage and JVM garbage collection

Exploring Elasticsearch heap size usage and JVM garbage collection, including best practices and how to resolve issues when heap memory usage is too high or when JVM performance is not optimal.

By: Kofi Bartlett

Elasticsearch BBQ vs. OpenSearch FAISS: Vector search performance comparison

Vector Database

April 15, 2025

Elasticsearch BBQ vs. OpenSearch FAISS: Vector search performance comparison

A performance comparison between Elasticsearch BBQ and OpenSearch FAISS.

By: Ugo Sangiorgi

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself