IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Anthropic inference service Azure OpenAI inference service »

› › ›

Azure AI studio inference service

edit

Azure AI studio inference service

edit

Creates an inference endpoint to perform an inference task with the azureaistudio service.

Request

edit

PUT /_inference/<task_type>/<inference_id>

Path parameters

edit

<inference_id>

(Required, string) The unique identifier of the inference endpoint.

<task_type>

(Required, string) The type of the inference task that the model will perform.

Available task types:

completion,
text_embedding.

Request body

edit

service

(Required, string) The type of service supported for the specified task type. In this case, azureaistudio.

service_settings

(Required, object) Settings used to install the inference model.

These settings are specific to the azureaistudio service.

api_key

(Required, string) A valid API key of your Azure AI Studio model deployment. This key can be found on the overview page for your deployment in the management section of your Azure AI Studio account.

You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

target

(Required, string) The target URL of your Azure AI Studio model deployment. This can be found on the overview page for your deployment in the management section of your Azure AI Studio account.

provider

(Required, string) The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:

cohere - available for text_embedding and completion task types
databricks - available for completion task type only
meta - available for completion task type only
microsoft_phi - available for completion task type only
mistral - available for completion task type only
openai - available for text_embedding and completion task types

endpoint_type

(Required, string) One of token or realtime. Specifies the type of endpoint that is used in your model deployment. There are two endpoint types available for deployment through Azure AI Studio. "Pay as you go" endpoints are billed per token. For these, you must specify token for your endpoint_type. For "real-time" endpoints which are billed per hour of usage, specify realtime.

rate_limit

(Optional, object) By default, the azureaistudio service sets the number of requests allowed per minute to 240. This helps to minimize the number of rate limit errors returned from Azure AI Studio. To modify this, set the requests_per_minute setting of this object in your service settings:

"rate_limit": {
    "requests_per_minute": <<number_of_requests>>
}

task_settings

(Optional, object) Settings to configure the inference task. These settings are specific to the <task_type> you specified.

task_settings for the completion task type

do_sample: (Optional, float) Instructs the inference process to perform sampling or not. Has no effect unless temperature or top_p is specified.
max_new_tokens: (Optional, integer) Provides a hint for the maximum number of output tokens to be generated. Defaults to 64.
temperature: (Optional, float) A number in the range of 0.0 to 2.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. Should not be used if top_p is specified.
top_p: (Optional, float) A number in the range of 0.0 to 2.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. Should not be used if temperature is specified.

task_settings for the text_embedding task type

user: (optional, string) Specifies the user issuing the request, which can be used for abuse detection.

Azure AI Studio service example

edit

The following example shows how to create an inference endpoint called azure_ai_studio_embeddings to perform a text_embedding task type. Note that we do not specify a model here, as it is defined already via our Azure AI Studio deployment.

The list of embeddings models that you can choose from in your deployment can be found in the Azure AI Studio model explorer.

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="azure_ai_studio_embeddings",
    inference_config={
        "service": "azureaistudio",
        "service_settings": {
            "api_key": "<api_key>",
            "target": "<target_uri>",
            "provider": "<model_provider>",
            "endpoint_type": "<endpoint_type>"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "azure_ai_studio_embeddings",
  inference_config: {
    service: "azureaistudio",
    service_settings: {
      api_key: "<api_key>",
      target: "<target_uri>",
      provider: "<model_provider>",
      endpoint_type: "<endpoint_type>",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/azure_ai_studio_embeddings
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "<api_key>",
        "target": "<target_uri>",
        "provider": "<model_provider>",
        "endpoint_type": "<endpoint_type>"
    }
}

Copy as curl Try in Elastic

The next example shows how to create an inference endpoint called azure_ai_studio_completion to perform a completion task type.

resp = client.inference.put(
    task_type="completion",
    inference_id="azure_ai_studio_completion",
    inference_config={
        "service": "azureaistudio",
        "service_settings": {
            "api_key": "<api_key>",
            "target": "<target_uri>",
            "provider": "<model_provider>",
            "endpoint_type": "<endpoint_type>"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "completion",
  inference_id: "azure_ai_studio_completion",
  inference_config: {
    service: "azureaistudio",
    service_settings: {
      api_key: "<api_key>",
      target: "<target_uri>",
      provider: "<model_provider>",
      endpoint_type: "<endpoint_type>",
    },
  },
});
console.log(response);

PUT _inference/completion/azure_ai_studio_completion
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "<api_key>",
        "target": "<target_uri>",
        "provider": "<model_provider>",
        "endpoint_type": "<endpoint_type>"
    }
}

Copy as curl Try in Elastic

The list of chat completion models that you can choose from in your deployment can be found in the Azure AI Studio model explorer.

« Anthropic inference service Azure OpenAI inference service »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Azure AI studio inference service

Azure AI studio inference service

Request

Path parameters

Request body

Azure AI Studio service example

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards