Create an OpenAI inference endpoint
Added in 8.13.0
Create an inference endpoint to perform an inference task with the elasticsearch
service.
Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings.
If you use the ELSER or the E5 model through the elasticsearch
service, the API request will automatically download and deploy the model if it isn't downloaded yet.
You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it.
To verify the deployment status, use the get trained model statistics API.
Look for "state": "fully_allocated"
in the response and ensure that the "allocation_count"
matches the "target_allocation_count"
.
Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
Path parameters
-
task_type
string Required The type of the inference task that the model will perform.
Values are
rerank
,sparse_embedding
, ortext_embedding
. -
elasticsearch_inference_id
string Required The unique identifier of the inference endpoint. The must not match the
model_id
.
Body
-
chunking_settings
object -
service
string Required Value is
elasticsearch
. -
service_settings
object Required -
task_settings
object
curl \
--request PUT 'http://api.example.com/_inference/{task_type}/{elasticsearch_inference_id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"service\": \"elasticsearch\",\n \"service_settings\": {\n \"adaptive_allocations\": { \n \"enabled\": true,\n \"min_number_of_allocations\": 1,\n \"max_number_of_allocations\": 4\n },\n \"num_threads\": 1,\n \"model_id\": \".elser_model_2\" \n }\n}"'
{
"service": "elasticsearch",
"service_settings": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 4
},
"num_threads": 1,
"model_id": ".elser_model_2"
}
}
{
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1",
"num_threads": 1,
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 1,
"max_number_of_allocations": 4
}
}
}
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".multilingual-e5-small"
}
}
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": "msmarco-MiniLM-L12-cos-v5"
}
}
{
"service": "elasticsearch",
"service_settings": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
},
"num_threads": 1,
"model_id": ".multilingual-e5-small"
}
}
{
"service": "elasticsearch",
"service_settings": {
"deployment_id": ".elser_model_2"
}
}
{
"inference_id": "use_existing_deployment",
"task_type": "sparse_embedding",
"service": "elasticsearch",
"service_settings": {
"num_allocations": 2,
"num_threads": 1,
"model_id": ".elser_model_2",
"deployment_id": ".elser_model_2"
},
"chunking_settings": {
"strategy": "sentence",
"max_chunk_size": 250,
"sentence_overlap": 1
}
}