Elasticsearch inference service
editElasticsearch inference service
editCreates an inference endpoint to perform an inference task with the elasticsearch
service.
If you use the E5 model through the elasticsearch
service, the API
request will automatically download and deploy the model if it isn’t downloaded
yet.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
-
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
rerank
, -
text_embedding
.
-
Request body
edit-
service
-
(Required, string)
The type of service supported for the specified task type. In this case,
elasticsearch
. -
service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
elasticsearch
service.-
model_id
-
(Required, string)
The name of the model to use for the inference task.
It can be the ID of either a built-in model (for example,
.multilingual-e5-small
for E5) or a text embedding model already uploaded through Eland. -
num_allocations
- (Required, integer) The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
-
num_threads
-
(Required, integer)
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process;
threads_per_allocations
must not exceed the number of available allocated processors per node. Must be a power of 2. Max allowed value is 32.
-
-
task_settings
-
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.task_settings
for thererank
task type-
return_documents
-
(Optional, Boolean)
Returns the document instead of only the index. Defaults to
true
.
-
E5 via the elasticsearch
service
editThe following example shows how to create an inference endpoint called
my-e5-model
to perform a text_embedding
task type.
The API request below will automatically download the E5 model if it isn’t already downloaded and then deploy the model.
resp = client.inference.put( task_type="text_embedding", inference_id="my-e5-model", inference_config={ "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": ".multilingual-e5-small" } }, ) print(resp)
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "my-e5-model", inference_config: { service: "elasticsearch", service_settings: { num_allocations: 1, num_threads: 1, model_id: ".multilingual-e5-small", }, }, }); console.log(response);
PUT _inference/text_embedding/my-e5-model { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": ".multilingual-e5-small" } }
The |
You might see a 502 bad gateway error in the response when using the Kibana Console.
This error usually just reflects a timeout, while the model downloads in the background.
You can check the download progress in the Machine Learning UI.
If using the Python client, you can set the timeout
parameter to a higher value.
Models uploaded by Eland via the elasticsearch service
editThe following example shows how to create an inference endpoint called
my-msmarco-minilm-model
to perform a text_embedding
task type.
resp = client.inference.put( task_type="text_embedding", inference_id="my-msmarco-minilm-model", inference_config={ "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": "msmarco-MiniLM-L12-cos-v5" } }, ) print(resp)
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "my-msmarco-minilm-model", inference_config: { service: "elasticsearch", service_settings: { num_allocations: 1, num_threads: 1, model_id: "msmarco-MiniLM-L12-cos-v5", }, }, }); console.log(response);
PUT _inference/text_embedding/my-msmarco-minilm-model { "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": "msmarco-MiniLM-L12-cos-v5" } }
Provide an unique identifier for the inference endpoint. The |
|
The |