Create a Watsonx inference endpoint
Added in 8.16.0
Creates an inference endpoint to perform an inference task with the watsonxai
service.
You need an IBM Cloud Databases for Elasticsearch deployment to use the watsonxai
inference service.
You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.
When you create an inference endpoint, the associated machine learning model is automatically deployed if it is not already running.
After creating the endpoint, wait for the model deployment to complete before using it.
To verify the deployment status, use the get trained model statistics API.
Look for "state": "fully_allocated"
in the response and ensure that the "allocation_count"
matches the "target_allocation_count"
.
Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
Path parameters
-
task_type
string Required The task type. The only valid task type for the model to perform is
text_embedding
.Value is
text_embedding
. -
watsonx_inference_id
string Required The unique identifier of the inference endpoint.
Body
-
service
string Required Value is
watsonxai
. -
service_settings
object Required Additional properties are allowed.
curl \
--request PUT http://api.example.com/_inference/{task_type}/{watsonx_inference_id} \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"service":"watsonxai","service_settings":{"api_key":"string","api_version":"string","model_id":"string","project_id":"string","rate_limit":{"requests_per_minute":42.0},"url":"string"}}'