IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Tutorial: hybrid search with semantic_text Tutorial: semantic search with ELSER »

› › ›

Tutorial: semantic search with the inference API

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Tutorial: semantic search with the inference API

edit

The instructions in this tutorial shows you how to use the inference API workflow with various services to perform semantic search on your data.

For the easiest way to perform semantic search in the Elastic Stack, refer to the semantic_text end-to-end tutorial.

The following examples use the:

embed-english-v3.0 model for Cohere
all-mpnet-base-v2 model from HuggingFace
text-embedding-ada-002 second generation embedding model for OpenAI
models available through Azure AI Studio or Azure OpenAI
text-embedding-004 model for Google Vertex AI
mistral-embed model for Mistral
amazon.titan-embed-text-v1 model for Amazon Bedrock
ops-text-embedding-zh-001 model for AlibabaCloud AI

You can use any Cohere and OpenAI models, they are all supported by the inference API. For a list of recommended models available on HuggingFace, refer to the supported model list.

Click the name of the service you want to use on any of the widgets below to review the corresponding instructions.

Requirements

edit

A Cohere account is required to use the inference API with the Cohere service.

ELSER is a model trained by Elastic. If you have an Elasticsearch deployment, there is no further requirement for using the inference API with the elser service.

A Google Cloud account
A project in Google Cloud
The Vertex AI API enabled in your project
A valid service account for the Google Vertex AI API
The service account must have the Vertex AI User role and the aiplatform.endpoints.predict permission.

Create an inference endpoint

edit

Create an inference endpoint by using the Create inference API:

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="cohere_embeddings",
    inference_config={
        "service": "cohere",
        "service_settings": {
            "api_key": "<api_key>",
            "model_id": "embed-english-v3.0",
            "embedding_type": "byte"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "cohere_embeddings",
  inference_config: {
    service: "cohere",
    service_settings: {
      api_key: "<api_key>",
      model_id: "embed-english-v3.0",
      embedding_type: "byte",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/cohere_embeddings 
{
    "service": "cohere",
    "service_settings": {
        "api_key": "<api_key>", 
        "model_id": "embed-english-v3.0", 
        "embedding_type": "byte"
    }
}

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `cohere_embeddings`.
	The API key of your Cohere account. You can find your API keys in your Cohere dashboard under the API keys section. You need to provide your API key only once. The Get inference API does not return your API key.
	The name of the embedding model to use. You can find the list of Cohere embedding models here.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `hugging_face_embeddings`.
	A valid HuggingFace access token. You can find on the settings page of your account.
	The inference endpoint URL you created on Hugging Face.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `openai_embeddings`.
	The API key of your OpenAI account. You can find your OpenAI API keys in your OpenAI account under the API keys section. You need to provide your API key only once. The Get inference API does not return your API key.
	The name of the embedding model to use. You can find the list of OpenAI embedding models here.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `azure_openai_embeddings`.
	The API key for accessing your Azure OpenAI services. Alternately, you can provide an `entra_id` instead of an `api_key` here. The Get inference API does not return this information.
	The name our your Azure resource.
	The id of your deployed model.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `azure_ai_studio_embeddings`.
	The API key for accessing your Azure AI Studio deployed model. You can find this on your model deployment’s overview page.
	The target URI for accessing your Azure AI Studio deployed model. You can find this on your model deployment’s overview page.
	The model provider, such as `cohere` or `openai`.
	The deployed endpoint type. This can be `token` (for "pay as you go" deployments), or `realtime` for real-time deployment endpoints.

	The task type is `text_embedding` per the path. `google_vertex_ai_embeddings` is the unique identifier of the inference endpoint (its `inference_id`).
	A valid service account in JSON format for the Google Vertex AI API.
	For the list of the available models, refer to the Text embeddings API page.
	The name of the location to use for the inference task. Refer to Generative AI on Vertex AI locations for available locations.
	The name of the project to use for the inference task.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `mistral_embeddings`.
	The API key for accessing the Mistral API. You can find this in your Mistral account’s API Keys page.
	The Mistral embeddings model name, for example `mistral-embed`.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `amazon_bedrock_embeddings`.
	The access key can be found on your AWS IAM management page for the user account to access Amazon Bedrock.
	The secret key should be the paired key for the specified access key.
	Specify the region that your model is hosted in.
	Specify the model provider.
	The model ID or ARN of the model to use.

	The task type is `text_embedding` in the path and the `inference_id` which is the unique identifier of the inference endpoint is `alibabacloud_ai_search_embeddings`.
	The API key for accessing the AlibabaCloud AI Search API. You can find your API keys in your AlibabaCloud account under the API keys section. You need to provide your API key only once. The Get inference API does not return your API key.
	The AlibabaCloud AI Search embeddings model name, for example `ops-text-embedding-zh-001`.
	The name our your AlibabaCloud AI Search host address.
	The name our your AlibabaCloud AI Search workspace.

	The name of the field to contain the generated tokens. It must be refrenced in the inference pipeline configuration in the next step.
	The field to contain the tokens is a `dense_vector` field.
	The output dimensions of the model. Find this value in the Cohere documentation of the model you use.
	The name of the field from which to create the dense vector representation. In this example, the name of the field is `content`. It must be referenced in the inference pipeline configuration in the next step.
	The field type which is text in this example.

	The name of the field to contain the generated tokens. It must be refrenced in the inference pipeline configuration in the next step.
	The field to contain the tokens is a `sparse_vector` field for ELSER.
	The name of the field from which to create the dense vector representation. In this example, the name of the field is `content`. It must be referenced in the inference pipeline configuration in the next step.
	The field type which is text in this example.

	The name of the field to contain the generated tokens. It must be referenced in the inference pipeline configuration in the next step.
	The field to contain the tokens is a `dense_vector` field.
	The output dimensions of the model. This value may be found on the model card in your Azure AI Studio deployment.
	For Azure AI Studio embeddings, the `dot_product` function should be used to calculate similarity.
	The name of the field from which to create the dense vector representation. In this example, the name of the field is `content`. It must be referenced in the inference pipeline configuration in the next step.
	The field type which is text in this example.

	The name of the field to contain the generated embeddings. It must be referenced in the inference pipeline configuration in the next step.
	The field to contain the embeddings is a `dense_vector` field.
	The output dimensions of the model. This value may be found on the Google Vertex AI model reference. The inference API attempts to calculate the output dimensions automatically if `dims` are not specified.
	For Google Vertex AI embeddings, the `dot_product` function should be used to calculate similarity.
	The name of the field from which to create the dense vector representation. In this example, the name of the field is `content`. It must be referenced in the inference pipeline configuration in the next step.
	The field type which is `text` in this example.

	The name of the inference endpoint you created by using the Create inference API, it’s referred to as `inference_id` in that step.
	Configuration object that defines the `input_field` for the inference process and the `output_field` that will contain the inference results.

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Tutorial: semantic search with the inference API

Tutorial: semantic search with the inference API

Requirements

Create an inference endpoint

Create the index mapping

Create an ingest pipeline with an inference processor

Load data

Ingest the data through the inference ingest pipeline

Semantic search

Interactive tutorials

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards