Elastic Inference Service (EIS)
editElastic Inference Service (EIS)
editCreates an inference endpoint to perform an inference task with the elastic
service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
-
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
chat_completion
, -
sparse_embedding
.
-
The chat_completion
task type only supports streaming and only through the _unified
API.
For more information on how to use the chat_completion
task type, please refer to the chat completion documentation.
Request body
edit-
max_chunk_size
-
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
-
(Optional, integer)
Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunk_size
. -
sentence_overlap
-
(Optional, integer)
Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
-
(Optional, string) Specifies the chunking strategy. It could be either
sentence
orword
.-
service
-
(Required, string)
The type of service supported for the specified task type. In this case,
elastic
. -
service_settings
- (Required, object) Settings used to install the inference model.
-
-
model_id
- (Required, string) The name of the model to use for the inference task.
-
rate_limit
-
(Optional, object) By default, the
elastic
service sets the number of requests allowed per minute to1000
in case ofsparse_embedding
and240
in case ofchat_completion
. This helps to minimize the number of rate limit errors returned. To modify this, set therequests_per_minute
setting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }
Elastic Inference Service example
editThe following example shows how to create an inference endpoint called elser-model-eis
to perform a text_embedding
task type.
PUT _inference/sparse_embedding/elser-model-eis { "service": "elastic", "service_settings": { "model_name": "elser" } }
The following example shows how to create an inference endpoint called chat-completion-endpoint
to perform a chat_completion
task type.
PUT /_inference/chat_completion/chat-completion-endpoint { "service": "elastic", "service_settings": { "model_id": "model-1" } }