Elasticsearch open inference API adds support for Anthropic’s Claude

We are excited to announce our latest addition to the Elasticsearch Open Inference API: the integration of Anthropic's Claude. This work enables Elastic users to connect directly with the Anthropic platform, and use large language models like Claude 3.5 Sonnet to build GenAI applications with use cases such as question answering. Previously customers could access this capability from providers like Amazon Bedrock, but now can utilize their Anthropic account for these purposes.

Using Anthropic’s messages to answer questions

In this blog, we’ll use the Claude Messages API to answer questions during ingestion to have answers ready ahead of searching. Before we start interacting with Elasticsearch, make sure you have an Anthropic API key by creating an evaluation account first and generating a key. We’ll use Kibana's Console to execute these next steps in Elasticsearch without setting up an IDE.

First, we configure an inference endpoint, which will interact with Anthropic’s messages API:

PUT _inference/completion/anthropic_completion
{
  "service": "anthropic",
  "service_settings": {
    "api_key": "<api key>",
    "model_id": "claude-3-5-sonnet-20240620"
  },
  "task_settings": {
    "max_tokens": 1024
  }
}

We’ll get back a response similar to the following with status code 200 OK on successful inference endpoint creation:

{
  "model_id": "anthropic_completion",
  "task_type": "completion",
  "service": "anthropic",
  "service_settings": {
    "model_id": "claude-3-5-sonnet-20240620",
    "rate_limit": {
      "requests_per_minute": 50
    }
  },
  "task_settings": {
    "max_tokens": 1024
  }
}

We can now call the configured endpoint to perform completion on any text input. Let’s ask the model for a short description of GenAI:

POST _inference/completion/anthropic_completion
{
  "input": "What is a short description of GenAI?"
}

We should get a response back with a status code 200 OK providing a short description of GenAI:

{
  "completion": [
    {
      "result": "GenAI, short for Generative Artificial Intelligence, refers to AI systems that can create new content, such as text, images, audio, or video, based on patterns learned from existing data. These systems use advanced machine learning techniques, often involving deep neural networks, to generate human-like outputs in response to prompts or inputs. GenAI has diverse applications across industries, including content creation, design, coding, and problem-solving."
    }
  ]
}

Now we can set up a catalog of questions which we want to be answered during ingestion. We’ll use the Elasticsearch Bulk API to index these questions about Elastic products:

POST _bulk
{ "index" : { "_index" : "questions" } }
{"question": "What is Elasticsearch?"}
{ "index" : { "_index" : "questions" } }
{"question": "What is Kibana?"}
{ "index" : { "_index" : "questions" } }
{"question": "What is Logstash?"}

A response similar to the one below should be returned upon successful indexing:

{
  "errors": false,
  "took": 1552829728,
  "items": [
    {
      "index": {
        "_index": "questions",
        "_id": "ipR_qJABkw3SJM5Tm3IC",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "questions",
        "_id": "i5R_qJABkw3SJM5Tm3IC",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "questions",
        "_id": "jJR_qJABkw3SJM5Tm3IC",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

We’ll now create our question and answering ingest pipeline using the script, inference, and remove processors:

PUT _ingest/pipeline/question_answering_pipeline
{
  "processors": [
    {
      "script": {
        "source": "ctx.prompt = 'Please answer the following question: ' + ctx.question"
      }
    },
    {
      "inference": {
        "model_id": "anthropic_completion",
        "input_output": {
          "input_field": "prompt",
          "output_field": "answer"
        }
      }
    },
    {
      "remove": {
        "field": "prompt"
      }
    }
  ]
}

The pipeline prefixes the question field with the text: “Please answer the following question: “ in a temporary field called prompt. The content of the temporary prompt field is sent to the Anthropic service via the inference API. Using an ingest pipeline provides extensive flexibility as you can set the pre-prompt to fit your needs. This approach can be used to summarize documents as well.

Next, we’ll send our documents containing the questions through the question and answering pipeline by calling the reindex API.

POST _reindex
{
  "source": {
    "index": "questions",
    "size": 50
  },
  "dest": {
    "index": "answers",
    "pipeline": "question_answering_pipeline"
  }
}

We should get back a response similar to the following:

{
  "took": 9571,
  "timed_out": false,
  "total": 3,
  "updated": 0,
  "created": 3,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

In a production setup, you’ll likely use another ingestion mechanism to ingest your documents in an automated manner. Check out our Adding data to Elasticsearch guide to learn more about the various options offered by Elastic to ingest data into Elasticsearch. We’re also committed to showcasing ingest mechanisms and providing guidance on bringing data into Elasticsearch using 3rd party tools. For example, take a look at Ingest Data from Snowflake to Elasticsearch using Meltano: A developer’s journey to see how to use Meltano for ingesting data.

We can now search for our pre-generated answers using the Search API:

POST answers/_search
{
  "query": {
    "match_all": {}
  }
}

The response will contain the pre-generated answers:

{
  "took": 11,
  "timed_out": false,
  "_shards": { ... },
  "hits": {
    "total": { ... },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "answers",
        "_id": "4RO6YY8Bv2OsAP2iNusn",
        "_score": 1.0,
        "_ignored": [
          "answer.keyword"
        ],
        "_source": {
          "model_id": "azure_openai_completion",
          "question": "What is Elasticsearch?",
          "answer": "Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. It can handle a wide variety of data types, including textual, numerical, geospatial, structured, and unstructured data. Elasticsearch is scalable and designed to operate in real-time, making it an ideal choice for use cases such as application search, log and event data analysis, and anomaly detection."
        }
      },
      { ... },
      { ... }
    ]
  }
}

Pre-generating answers for frequently asked questions is particularly effective in reducing operational costs. By minimizing the need for on-the-fly response generation, you can significantly cut down on the amount of computational resources required. Additionally, this method ensures that every user receives the same precise information. Consistency is critical, especially in fields requiring high reliability and accuracy such as medical, legal, or technical support.

Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!
Recommended Articles