Start a trained model deployment | Elasticsearch Serverless API documentation

Start a trained model deployment Generally available

POST /_ml/trained_models/{model_id}/deployment/_start

Api key auth

It allocates the model to every machine learning node.

Required authorization

Cluster privileges: manage_ml

Path parameters

model_id string Required

The unique identifier of the trained model. Currently, only PyTorch models are supported.

Query parameters

cache_size number | string

The inference cache size (in memory outside the JVM heap) per node for the model. The default value is the same size as the model_size_bytes. To disable the cache, 0b can be provided.
number_of_allocations number

The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads. If adaptive_allocations is enabled, do not set this value, because it’s automatically set.
priority string

The deployment priority

Values are normal or low.
queue_capacity number

Specifies the number of inference requests that are allowed in the queue. After the number of requests exceeds this value, new requests are rejected with a 429 error.
threads_per_allocation number

Sets the number of threads used by each model allocation during inference. This generally increases the inference speed. The inference process is a compute-bound process; any number greater than the number of available hardware threads on the machine does not increase the inference speed. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.
timeout string

Specifies the amount of time to wait for the model to deploy.

External documentation
wait_for string
Specifies the allocation status to wait for before returning.

Supported values include:
- started: The trained model is started on at least one node.
- starting: Trained model deployment is starting but it is not yet deployed on any nodes.
- fully_allocated: Trained model deployment has started on all valid nodes.
Values are started, starting, or fully_allocated.

application/json

Body

adaptive_allocations object

Adaptive allocations configuration. When enabled, the number of allocations is set based on the current load. If adaptive_allocations is enabled, do not set the number of allocations manually.
Hide adaptive_allocations attributes Show adaptive_allocations attributes object
- enabled boolean Required
  
  If true, adaptive_allocations is enabled
- min_number_of_allocations number
  
  Specifies the minimum number of allocations to scale to. If set, it must be greater than or equal to 0. If not defined, the deployment scales to 0.
- max_number_of_allocations number
  
  Specifies the maximum number of allocations to scale to. If set, it must be greater than or equal to min_number_of_allocations.

Responses

200 application/json
Hide response attribute Show response attribute object
- assignment object Required
  
  Hide assignment attributes Show assignment attributes object
  
  adaptive_allocations object | string | null
  
  One of:
  AdaptiveAllocationsSettings object string-2 string | null
  
  Hide attributes Show attributes
  
  enabled boolean Required
  
  If true, adaptive_allocations is enabled
  
  min_number_of_allocations number
  
  Specifies the minimum number of allocations to scale to. If set, it must be greater than or equal to 0. If not defined, the deployment scales to 0.
  
  max_number_of_allocations number
  
  Specifies the maximum number of allocations to scale to. If set, it must be greater than or equal to min_number_of_allocations.
  
  assignment_state string Required
  
  The overall assignment state.
  
  Supported values include:
  
  started: The deployment is usable; at least one node has the model allocated.
  
  starting: The deployment has recently started but is not yet usable; the model is not allocated on any nodes.
  
  stopping: The deployment is preparing to stop and deallocate the model from the relevant nodes.
  
  failed: The deployment is on a failed state and must be re-deployed.
  
  Values are started, starting, stopping, or failed.
  
  max_assigned_allocations number
  
  reason string
  
  routing_table object Required
  
  The allocation state for each node.
  
  Hide routing_table attribute Show routing_table attribute object
  
  * object Additional properties
  
  Hide * attributes Show * attributes object
  
  reason string
  
  The reason for the current state. It is usually populated only when the routing_state is failed.
  
  routing_state string Required
  
  The current routing state.
  
  Supported values include:
  
  failed: The allocation attempt failed.
  
  started: The trained model is allocated and ready to accept inference requests.
  
  starting: The trained model is attempting to allocate on this node; inference requests are not yet accepted.
  
  stopped: The trained model is fully deallocated from this node.
  
  stopping: The trained model is being deallocated from this node.
  
  Values are failed, started, starting, stopped, or stopping.
  
  current_allocations number Required
  
  Current number of allocations.
  
  target_allocations number Required
  
  Target number of allocations.
  
  start_time string | number
  
  The timestamp when the deployment started.
  
  One of:
  string-1 string EpochTimeUnitMillis number
  
  Time unit for milliseconds
  
  task_parameters object Required
  
  Hide task_parameters attributes Show task_parameters attributes object
  
  model_bytes
  
  model_id string Required
  
  The unique identifier for the trained model.
  
  deployment_id string Required
  
  The unique identifier for the trained model deployment.
  
  cache_size
  
  number_of_allocations number Required
  
  The total number of allocations this model is assigned across ML nodes.
  
  priority string Required
  
  Values are normal or low.
  
  per_deployment_memory_bytes
  
  per_allocation_memory_bytes
  
  queue_capacity number Required
  
  Number of inference requests are allowed in the queue at a time.
  
  threads_per_allocation number Required
  
  Number of threads per allocation.

POST /_ml/trained_models/{model_id}/deployment/_start

POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m

resp = client.ml.start_trained_model_deployment(
    model_id="elastic__distilbert-base-uncased-finetuned-conll03-english",
    wait_for="started",
    timeout="1m",
)

const response = await client.ml.startTrainedModelDeployment({
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  wait_for: "started",
  timeout: "1m",
});

response = client.ml.start_trained_model_deployment(
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  wait_for: "started",
  timeout: "1m"
)

$resp = $client->ml()->startTrainedModelDeployment([
    "model_id" => "elastic__distilbert-base-uncased-finetuned-conll03-english",
    "wait_for" => "started",
    "timeout" => "1m",
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m"

client.ml().startTrainedModelDeployment(s -> s
    .modelId("elastic__distilbert-base-uncased-finetuned-conll03-english")
    .timeout(t -> t
        .offset(1)
    )
    .waitFor(DeploymentAllocationState.Started)
);

Start a trained model deployment Generally available

Required authorization

Path parameters

Query parameters

Body

Responses

adaptive_allocations object | string | null

start_time string | number