ChatGPT and Elasticsearch revisited: The RAG really tied the app together

Follow up to the blog ChatGPT and Elasticsearch: OpenAI meets private data.

In this blog, you will learn how to:

  • Create an Elasticsearch Serverless project
  • Create an Inference Endpoint to generate embeddings with ELSER
  • Use a Semantic Text field for auto-chunking and calling the Inference Endpoint
  • Use the Open Crawler to crawl blogs
  • Connect to an LLM using Elastic’s Playground to test prompts and context settings for a RAG chat application.

If you want to jump right into the code, you can view the accompanying Jupyter Notebook here.

The Dude Abides

April 2023

A lot has changed since I wrote the initial ChatGPT and Elasticsearch: OpenAI meets private data. Most people were just playing around with ChatGPT, if they had tried it at all. And every booth at every tech conference didn’t feature the letters “AI” (whether it is a useful fit or not).

August 2024

Since then, Elastic has embraced being a full featured vector database and is putting a lot of engineering effort into making it the best vector database option for anyone building a search application. So as not to spend several pages talking about all the enhancements to Elasticsearch, here is a non-exhaustive list in no particular order:

With all that change and more, the original blog needs a rewrite. So let’s get started.

Updated flow

The plan for this updated flow will be:

  1. Setup
    1. Create a new Elasticsearch serverless search project
    2. Create an embedding inference API using ELSER
    3. Configure an index template with a semantic_text field
    4. Create a new LLM connector
    5. Configure a chat completion inference service using our LLM connector
  2. Ingest and Test
    1. Crawl the Elastic Labs sites (Search, Observability, Security) with the Elastic Open Web Crawler.
    2. Use Playground to test prompts using our indexed Labs content
  3. Configure and deploy our App
    1. Export the generated code from Playground to an application using FastAPI as the backend and React as the front end.
    2. Run it locally
    3. Optionally deploy our chatbot to Google Cloud Run

Setup

Elasticsearch Serverless Project

We will be using an Elastic serverless project for our chatbot. Serverless removes much of the complexity of running an Elasticsearch cluster and lets you focus on actually using and gaining value from your data. Read more about the architecture of Serverless here.

If you don’t have an Elastic Cloud account, you can create a free two-week trial at elastic.co (Serverless pricing available here). If you already have one, you can simply log in.

Once logged in, you will need to create a cloud API key.

alt_text

NOTE: In the steps below, I will show the relevant parts of Python code. For the sake of brevity, I’m not going to show complete code that will import required libraries, wait for steps to complete, catch errors, etc.

For more robust code you can run, please see the accompanying Jypyter notebook!

Create Serverless Project

We will use our newly created API key to perform the next setup steps.

First off, create a new Elasticsearch project.

url = "https://api.elastic-cloud.com/api/v1/serverless/projects/elasticsearch" 

project_data = {
    "name": "The RAG Really Tied the App Together",
    "region_id": "aws-us-east-1",
    "optimized_for": "vector"
}

auth_header = f"ApiKey {api_key}"  # seeing what a comment lokos like with pound
headers = {
    "Content-Type": "application/json",
    "Authorization": auth_header
}

es_project = requests.post(url, json=project_data, headers=headers)  :four:
  • url - This is the standard Serverless endpoint for Elastic Cloud
  • project_data - Your Elasticsearch Serverless project settings
    • name - Name we want for the project
    • region_id - Region to deploy
    • optimized_for - Configuration type - We are using vector which isn’t strictly required for the ELSER model but can be suitable if you select a dense vector model such as e5.

Create Elasticsearch Python client

One nice thing about creating a programmatic project is that you will get back the connection information and credentials you need to interact with it!

es = Elasticsearch(es_project_keys['endpoints']['elasticsearch'],
                   basic_auth=(es_project_keys['credentials']['username'],
                              es_project_keys['credentials']['password']
                              )
                   )

ELSER Embedding API

Once the project is created, which usually takes less than a few minutes, we can prepare it to handle our labs’ data.

The first step is to configure the inference API for embedding. We will be using the Elastic Learned Sparse Encoder (ELSER).

  • Command to create the inference endpoint
  • Specify this endpoint will be for generating sparse embeddings
model_config = {
    "service": "elser",
    "service_settings": {
        "num_allocations": 8,
        "num_threads": 1
    }
}

inference_id = "my-elser-model"

create_endpoint = es.inference.put_model(
    inference_id=inference_id,
    task_type="sparse_embedding",
    body=model_config
)
  • model_config - Settings we want to use for deploying our semantic reranking model
    • service - Use the pre-defined elser inference service
    • service_settings.num_allocations - Deploy the model with 8 allocations
    • service_settings.num_threads - Deploy with one thread per allocation
  • inference_id - The name you want to give to you inference endpoint
  • task_type- Specifies this endpoint will be for generating sparse embeddings

This single command will trigger Elasticsearch to perform a couple of tasks:

  1. It will download the ELSER model.
  2. It will deploy (start) the ELSER model with eight allocations and one thread per allocation.
  3. It will create an inference API we use in our field mapping in the next step.

Index Mapping

With our ELSER API created, we will create our index template.

template_body = {
    "index_patterns": ["elastic-labs*"],
    "template": {
        "mappings": {
            "properties": {
                "body": {
                    "type": "text",
                    "copy_to": "semantic_body"
                },
                "semantic_body": {
                    "type": "semantic_text",
                    "inference_id": "my-elser-model"
                },
                "headings": {
                    "type": "text"
                },
                "id": {
                    "type": "keyword"
                },
                "meta_description": {
                    "type": "text"
                },
                "title": {
                    "type": "text"
                }
            }
        }
    }
}

template_resp = es.indices.put_index_template(  :eight:
    name="labs_template",
    body=template_body
)
  • index_patterns - The pattern of indices we want this template to apply to.
  • body - The main content of a web page the crawler collects will be written to
    • type - It is a text field
    • copy_to - We need to copy that text to our semantic text field for semantic processing
  • semantic_body is our semantic text field
    • This field will automatically handle chunking of long text and generating embeddings which we will later use for semantic search
    • inference_id specifies the name of the inference endpoint we created above, allowing us to generate embeddings from our ELSER model
  • headings - Heading tags from the html
  • id - crawl id for this document
  • meta_description - value of the description meta tag from the html
  • title is the title of the web page the content is from

Other fields will be indexed but auto-mapped. The ones we are focused on pre-defining in the template will not need to be both keyword and text type, which is defined automatically otherwise.

Most importantly, for this guide, we must define our semantic_text field and set a source field to copy from with copy_to. In this case, we are interested in performing semantic search on the body of the text, which the crawler indexes into the body.

Crawl All the Labs!

We can now install and configure the crawler to crawl the Elastic * Labs. We will loosely follow the excellent guide from the Open Crawler released for tech-preview Search Labs blog.

The steps below will use docker and run on a MacBook Pro. To run this with a different setup, consult the Open Crawler Github readme.

Clone the repo


Open the command line tool of your choice. I’ll be using Iterm2. Clone the crawler repo to your machine.

~/repos
❯ git clone git@github.com:elastic/crawler.git
Cloning into 'crawler'...
remote: Enumerating objects: 1944, done.
remote: Counting objects: 100% (418/418), done.
remote: Compressing objects: 100% (243/243), done.
remote: Total 1944 (delta 237), reused 238 (delta 170), pack-reused 1526
Receiving objects: 100% (1944/1944), 84.85 MiB | 31.32 MiB/s, done.
Resolving deltas: 100% (727/727), done.

Build the crawler container

Run the following command to build and run the crawler.

docker build -t crawler-image . && docker run -i -d --name crawler crawler-image
~/repos
 ❯ cd crawler
~/repos/crawler main
 ❯ docker build -t crawler-image . && docker run -i -d --name crawler crawler-image

[+] Building 66.9s (6/10)                                                                                                                                                                docker:desktop-linux
 => [internal] load build definition from Dockerfile					0.0s
 => => transferring dockerfile: 333B							0.0s
 => [internal] load .dockerignore							0.0s
 => => transferring context: 2B								0.0s
 => [internal] load metadata for docker.io/library/jruby:9.4.7.0-jdk21		1.7s
 => [auth] library/jruby:pull token for registry-1.docker.io			0.0s
...
...
 => [5/5] RUN make clean install								50.7s
 => exporting to image									0.9s
 => => exporting layers									0.9s
 => => writing image sha256:6b3f4000a121e76aba76fdbbf11b53f53a3fabba61c0b7cf3fdcdb21e244f1d8	0.0s
 => => naming to docker.io/library/crawler-image					0.0s
cc6c16941de04355c050ef5f5fd0041ee7f3505b8cf8448c7223f0d2e80b5498

Configure the crawler

Create a new YAML in your favorite editor (vim):

~/repos/crawler main
 ❯ vim config/elastic-labs.yml

We want to crawl all the documents on the three labs’ sites, but since blogs and tutorials on those sites tend to link out to other parts of elastic.co, we need to set a couple of runs to restrict the scope. We will allow crawling the three paths for our site and then deny anything else.

Paste the following in the file and save

domains:
  - url: https://www.elastic.co
    seed_urls:
      - https://www.elastic.co/search-labs
      - https://www.elastic.co/observability-labs
      - https://www.elastic.co/security-labs
    crawl_rules:
      - policy: allow
        type: begins
        pattern: /search-labs
      - policy: allow
        type: begins
        pattern: /observability-labs
      - policy: allow
        type: begins
        pattern: /security-labs
      - policy:deny
        type: regex
        pattern: .*/author/.*
      - policy: deny
        type: regex
        pattern: .*

output_sink: elasticsearch
output_index: elastic-labs
max_crawl_depth: 2

elasticsearch:
  host: "https://<your_serverless_project>.es.<region>.aws.elastic.cloud"
  port: "443"
  api_key: "<API Key generated above>"

Copy the configuration into the Docker container:

~/repos/crawler main ⇣
 ❯ docker cp config/elastic-labs.yml crawler:/app/config/elastic-labs.yml

Successfully copied 2.05kB to crawler:/app/config/elastic-labs.yml

Validate the domain

Ensure the config file has no issues by running:

 ❯ docker exec -it crawler bin/crawler validate config/elastic-labs.yml
Domain https://www.elastic.co is valid

Start the crawler

When you first run the crawler, processing all the articles on the three lab sites may take several minutes.

docker exec -it crawler bin/crawler crawl config/elastic-labs.yml
~/repos/crawler/config main ⇣
 ❯ docker exec -it crawler bin/crawler crawl config/elastic-labs.yml
[crawl:6692c3b584f98612e3a465ce] [primary] Initialized an in-memory URL queue for up to 10000 URLs
[crawl:6692c3b584f98612e3a465ce] [primary] ES connections will be authorized with configured API key
[crawl:6692c3b584f98612e3a465ce] [primary] ES connections will use SSL without ca_fingerprint
[crawl:6692c3b584f98612e3a465ce] [primary] Elasticsearch sink initialized for index [elastic-labs] with pipeline [ent-search-generic-ingestion]
[crawl:6692c3b584f98612e3a465ce] [primary] Starting the crawl with up to 10 parallel thread(s)...
[crawl:6692c3b584f98612e3a465ce] [primary] Crawl status: queue_size=11, pages_visited=1, urls_allowed=12, urls_denied={}, crawl_duration_msec=847, crawling_time_msec=635.0, avg_response_time_msec=635.0, active_threads=1, http_client={:max_connections=>100, :used_connections=>1}, status_codes={"200"=>1}

Confirm articles have been indexed

We will confirm two ways.

First, we will look at a sample document to ensure that ELSER embeddings have been generated. We just want to look at any doc so we can search without any arguments:

GET elastic-labs/_search

Ensure you get results and then check that the field body contains text and semantic_body.inference.chunks.0.embeddings contains tokens.

    "hits": [
      {
        "_index": "elastic-labs",
...
        "_source": {
          "body": "Tutorials Integrations Blog Start Free Trial Contact Sales Open navigation menu Overview ...
          "semantic_body": {
            "inference": {
              "inference_id": "my-elser-model",
              "model_settings": {
                "task_type": "sparse_embedding"
              },
              "chunks": [
                {
                  "text": "Tutorials Integrations Blog Start Free Trial Contact Sales Open navigation menu Overview ...
                  "embeddings": {
                    "##her": 2.1016746,
                    "elastic": 2.084594,
                    "##ai": 1.6336359,
                    "dock": 1.5765089,
                    ...

We can check we are gathering data from each of the three sites with a terms aggregation:

GET elastic-labs/_search
{
  "size": 0,
  "aggs": {
    "url_path_dir1": {
      "terms": {
        "field": "url_path_dir1.keyword"
      }
    }
  }
}

You should see results that start with one of our three site paths.

      "buckets": [
        {
          "key": "security-labs",
          "doc_count": 37
        },
        {
          "key": "observability-labs",
          "doc_count": 30
        },
        {
          "key": "search-labs",
          "doc_count": 6
        }
      ]

To the Playground!

With our data ingested, chunked, and inference, we can start working on the backend application code that will interact with the LLM for our RAG app.

alt_text

LLM Connection

We need to configure a connection for Playground to make API calls to an LLM. As of this writing, Playground supports chat completion connections to OpenAI, AWS Bedrock, and Google Gemini. More connections are planned, so check the docs for the latest list.

When you first enter the Playground UI, click on “Connect to an LLM”

alt_text

Since I used OpenAI for the original blog, we’ll stick with that. The great thing about the Playground is that you can switch connections to a different service, and the Playground code will generate code specifically to that service’s API specification. You only need to select which one you want to use today.

alt_text

In this step, you must fill out the fields depending on which LLM you wish to use. As mentioned above, since Playground will abstract away the API differences, you can use whichever supported LLM service works for you, and the rest of the steps in this guide will work the same.

If you don’t have an Azure OpenAI account or OpenAI API account, you can get one here (OpenAI now requires a $5 minimum to fund the API account).

alt_text

Once you have completed that, hit “Save,” and you will get confirmation that the connector has been added. After that, you just need to select the indices we will use in our app. You can select multiple, but since all our crawler data is going into elastic-labs, you can choose that one.

Click “Add data sources” and you can start using Playground!

alt_text

Select the “restaurant_reviews” index created earlier.

alt_text

Playing in the Playground

After adding your data source you will be in the Playground UI.

alt_text

To keep getting started as simple as possible, we will stick with all the default settings other than the prompt. However, for more details on Playground components and how to use them, check out the Playground: Experiment with RAG applications with Elasticsearch in minutes blog and the Playground documentation.

Experimenting with different settings to fit your particular data and application needs is an important part of setting up a RAG-backed application.

The defaults we will be using are:

  • Querying the semantic_body chunks
  • Using the three nearest semantic chunks as context to pass to the LLM

Creating a more detailed prompt

The default prompt in Playground is simply a placeholder. Prompt engineering continues to develop as LLMs become more capable. Exploring the ever-changing world of prompt engineering is a blog, but there are a few basic concepts to remember when creating a system prompt:

  • Be detailed when describing the app or service the LLM response is part of. This includes what data will be provided and who will consume the responses.
  • Provide example questions and responses. This technique, called few-shot-prompting, helps the LLM structure its responses.
  • Clearly state how the LLM should behave.
  • Specify the Desired Output Format.
  • Test and Iterate on Prompts.

With this in mind, we can create a more detailed system prompt:

You are a helpful and knowledgeable assistant designed to assist users in querying information related to Search, Observability, and Security. Your primary goal is to provide clear, concise, and accurate responses based on semantically relevant documents retrieved using Elasticsearch.

Guidelines:

Audience:
Assume the user could be of any experience level but lean towards a technical slant in your explanations.
Avoid overly complex jargon unless it is common in the context of Elasticsearch, Search, Observability, or Security.

Response Structure:
Clarity: Responses should be clear and concise, avoiding unnecessary verbosity.
Conciseness: Provide information in the most direct way possible, using bullet points when appropriate.

Formatting: Use Markdown formatting for:
Bullet points to organize information
Code blocks for any code snippets, configurations, or commands
Relevance: Ensure the information provided is directly relevant to the user's query, prioritizing accuracy.

Content:
Technical Depth: Offer sufficient technical depth while remaining accessible. Tailor the complexity based on the user's apparent knowledge level inferred from their query.

Examples: Where appropriate, provide examples or scenarios to clarify concepts or illustrate use cases.
Documentation Links: When applicable, suggest additional resources or documentation from Elastic.co that can further assist the user.

Tone and Style:
Maintain a professional yet approachable tone.
Encourage curiosity by being supportive and patient with all user queries, regardless of complexity.

Example Queries:
"How can I optimize my Elasticsearch cluster for large-scale data?"
"What are the best practices for implementing observability in a microservices architecture?"
"How can I secure sensitive data in Elasticsearch?"

Feel free to to test out different prompts and context settings to see what results you feel are best for your particular data. For more examples on advanced techiques, check out the Prompt section on the two part blog Advanced RAG Techniques. Again, see the Playground blog post for more details on the various settings you can tweak.

Export the Code

Behind the scenes, Playground generates all the backend chat code we need to perform semantic search, parse the relevant contextual fields, and make a chat completion call to the LLM. No coding work from us required!

In the upper right corner click on the “View Code” button to expand the code flyout

alt_text

You will see the generated python code with all the settings your configured as well as the the functions to make a semantic call to Elasticsearch, parse the results, built the complete prompt, make the call to the LLM, and parse those results.

Click the copy icon to copy the code.

alt_text

You can now incorporate the code into your own chat application!

Wrapup

A lot has changed since the first iteration of this blog over a year ago, and we covered a lot in this blog. You started from a cloud API key, created an Elasticsearch Serverless project, generated a cloud API key, configured the Open Web Crawler, crawled three Elastic Lab sites, chunked the long text, generated embeddings, tested out the optimal chat settings for a RAG application, and exported the code!

Where’s the UI, Vestal?

Be on the lookout for part two where we will integrate the playground code into a python backend with a React frontend. We will also look at deploying the full chat application.

For a complete set of code for everything above, see the accompanying Jypyter notebook

Ready to try this out on your own? Start a free trial.

Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our Beyond RAG Basics webinar to build your next GenAI app!

Recommended Articles