Azure OpenAI inference service
editAzure OpenAI inference service
editCreates an inference endpoint to perform an inference task with the azureopenai
service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
-
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion
, -
text_embedding
.
-
Request body
edit-
chunking_settings
-
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunking_size
-
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
-
(Optional, integer)
Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunking_size
. -
sentence_overlap
-
(Optional, integer)
Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
-
(Optional, string)
Specifies the chunking strategy.
It could be either
sentence
orword
.
-
-
service
-
(Required, string)
The type of service supported for the specified task type. In this case,
azureopenai
. -
service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
azureopenai
service.-
api_key
orentra_id
-
(Required, string) You must provide either an API key or an Entra ID. If you do not provide either, or provide both, you will receive an error when trying to create your model. See the Azure OpenAI Authentication documentation for more details on these authentication types.
You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
-
resource_name
- (Required, string) The name of your Azure OpenAI resource. You can find this from the list of resources in the Azure Portal for your subscription.
-
deployment_id
- (Required, string) The deployment name of your deployed models. Your Azure OpenAI deployments can be found though the Azure OpenAI Studio portal that is linked to your subscription.
-
api_version
- (Required, string) The Azure API version ID to use. We recommend using the latest supported non-preview version.
-
rate_limit
-
(Optional, object) The
azureopenai
service sets a default number of requests allowed per minute depending on the task type. Fortext_embedding
it is set to1440
. Forcompletion
it is set to120
. This helps to minimize the number of rate limit errors returned from Azure. To modify this, set therequests_per_minute
setting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }
More information about the rate limits for Azure can be found in the Quota limits docs and How to change the quotas.
-
-
task_settings
-
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.task_settings
for thecompletion
task type-
user
- (optional, string) Specifies the user issuing the request, which can be used for abuse detection.
task_settings
for thetext_embedding
task type-
user
- (optional, string) Specifies the user issuing the request, which can be used for abuse detection.
-
Azure OpenAI service example
editThe following example shows how to create an inference endpoint called
azure_openai_embeddings
to perform a text_embedding
task type.
Note that we do not specify a model here, as it is defined already via our Azure OpenAI deployment.
The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.
resp = client.inference.put( task_type="text_embedding", inference_id="azure_openai_embeddings", inference_config={ "service": "azureopenai", "service_settings": { "api_key": "<api_key>", "resource_name": "<resource_name>", "deployment_id": "<deployment_id>", "api_version": "2024-02-01" } }, ) print(resp)
const response = await client.inference.put({ task_type: "text_embedding", inference_id: "azure_openai_embeddings", inference_config: { service: "azureopenai", service_settings: { api_key: "<api_key>", resource_name: "<resource_name>", deployment_id: "<deployment_id>", api_version: "2024-02-01", }, }, }); console.log(response);
PUT _inference/text_embedding/azure_openai_embeddings { "service": "azureopenai", "service_settings": { "api_key": "<api_key>", "resource_name": "<resource_name>", "deployment_id": "<deployment_id>", "api_version": "2024-02-01" } }
The next example shows how to create an inference endpoint called
azure_openai_completion
to perform a completion
task type.
resp = client.inference.put( task_type="completion", inference_id="azure_openai_completion", inference_config={ "service": "azureopenai", "service_settings": { "api_key": "<api_key>", "resource_name": "<resource_name>", "deployment_id": "<deployment_id>", "api_version": "2024-02-01" } }, ) print(resp)
const response = await client.inference.put({ task_type: "completion", inference_id: "azure_openai_completion", inference_config: { service: "azureopenai", service_settings: { api_key: "<api_key>", resource_name: "<resource_name>", deployment_id: "<deployment_id>", api_version: "2024-02-01", }, }, }); console.log(response);
PUT _inference/completion/azure_openai_completion { "service": "azureopenai", "service_settings": { "api_key": "<api_key>", "resource_name": "<resource_name>", "deployment_id": "<deployment_id>", "api_version": "2024-02-01" } }
The list of chat completion models that you can choose from in your Azure OpenAI deployment can be found at the following places: