Create an Elastic Inference Service (EIS) inference endpoint Added in 8.12.0

PUT /_inference/{task_type}/{eis_inference_id}

Create an inference endpoint to perform an inference task through the Elastic Inference Service (EIS).

Path parameters

  • task_type string Required

    The type of the inference task that the model will perform. NOTE: The chat_completion task type only supports streaming and only through the _stream API.

    Value is chat_completion.

  • eis_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • service string Required

    Value is elastic.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • model_id string Required

      The name of the model to use for the inference task.

    • Hide rate_limit attribute Show rate_limit attribute object

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • Hide attributes Show attributes object
      • The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

PUT /_inference/{task_type}/{eis_inference_id}
curl \
 --request PUT 'http://api.example.com/_inference/{task_type}/{eis_inference_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"service":"elastic","service_settings":{"model_id":"string","rate_limit":{"requests_per_minute":42.0}}}'