OpenAI chat completions with Elasticsearch's open inference API

Elasticsearch has native integrations with the industry-leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps with the Elastic vector database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

OpenAI Chat Completions has been integrated into Elastic’s inference APIs. This feature marks another milestone in our journey of integrating cutting-edge AI capabilities within Elasticsearch, offering additional easy-to-use features like generating human-like text completions.

This blog explains how OpenAI chat completions and Elasticsearch can be used to summarize, translate or perform question & answering on any text. Before we get started, let's take a quick look at the recent Elastic features and integrations.

The essence of continuous innovation at Elastic

Elastic invests heavily in everything AI. We’ve recently released a lot of new features and exciting integrations:

Elasticsearch open inference API adds support for Cohere Embeddings
Introducing Elasticsearch vector database to Azure OpenAI Service On Your Data (preview)
Speeding Up Multi- graph Vector Search
...explore more of Elastic Search Labs to learn about recent developments

The new completion task type inside our inference API with OpenAI as the first backing provider is already available in our stateless offering on Elastic Cloud. It’ll be soon available to everyone in our next release.

Using OpenAI chat completions with Elasticsearch's open inference API

In this short guide we’ll show a simple example on how to use the new completion task type in the inference API during document ingestion. Please refer to the Elastic Search Labs GitHub repository for more in-depth guides and interactive notebooks.

For the following guide to work you'll need to have an active OpenAI account and obtain an API key. Refer to OpenAI’s quickstart guide for the steps you need to follow. You can choose from a variety of OpenAI’s models. In the following example we’ve used `gpt-3.5-turbo`.

In Kibana, you'll have access to a console for you to input these next steps in Elasticsearch without even needing to set up an IDE.

Firstly, you configure a model, which will perform the completions:

After running this command you should see a corresponding `200 OK` status indicating that the model is properly set up for performing inference on arbitrary text.

You’re now able to call the configured model to perform inference on arbitrary text input:

You’ll get a response with status code `200 OK` looking similar to the following:

The next command creates an example document we’ll summarize using the model we’ve just configured:

To summarize multiple documents, we’ll use an ingest pipeline together with the script-, inference- and remove-processor to set up our summarization pipeline.

This pipeline simply prefixes the content with the instruction “Please summarize the following text: “ in a temporary field so the configured model knows what to do with the text. You can change this text to anything you would like of course, which unlocks a variety of other popular use cases:

Question and Answering
Translation
…and many more!

The pipeline deletes the temporary field after performing inference.

We now send our document(s) through the summarization pipeline by calling the reindex API.

Your document is now summarized and ready to be searched:

That’s basically it, you just created a powerful summarization pipeline with a few simple API calls, which can be used with any ingestion mechanism! There are a lot of use cases, where summarization comes in handy, for example by summarizing large pieces of text before generating semantic embeddings or transforming large documents into a concise summary. This can reduce your storage cost, improve time-to-value for example, if you’re only interested in a summary of large documents etc. By the way if you want to extract text from binary documents you can take a look at our open-code data-extraction service!

Exciting future ahead

But we won’t stop here. We’re already working on integrating Cohere’s chat as another provider for our `completion` task. We’re also actively exploring new retrieval and ingestion use cases in combination with the completion API. Bookmark Elastic Search Labs now to stay up to date!

Report an issue