Perform inference on the service using the Unified Schema Added in 8.18.0
Path parameters
-
The task type
Values are
sparse_embedding
,text_embedding
,rerank
, orcompletion
. -
The inference Id
Query parameters
-
timeout string
Specifies the amount of time to wait for the inference request to complete.
Body
-
A list of objects representing the conversation.
-
model string
The ID of the model to use.
-
max_completion_tokens number
The upper bound limit for the number of tokens that can be generated for a completion request.
-
stop array[string]
A sequence of strings to control when the model should stop generating additional tokens.
-
temperature number
The sampling temperature to use.
-
tools array[object]
A list of tools that the model can call.
-
top_p number
Nucleus sampling, an alternative to sampling with temperature.
POST
/_inference/{task_type}/{inference_id}/_unified
curl \
--request POST http://api.example.com/_inference/{task_type}/{inference_id}/_unified \
--header "Content-Type: application/json" \
--data '{"messages":[{"":"string","role":"string","tool_call_id":"string","tool_calls":[{"id":"string","function":{"arguments":"string","name":"string"},"type":"string"}]}],"model":"string","max_completion_tokens":42.0,"stop":["string"],"temperature":42.0,"":"string","tools":[{"type":"string","function":{"description":"string","name":"string","parameters":{},"strict":true}}],"top_p":42.0}'
Request examples
{
"messages": [
{
"": "string",
"role": "string",
"tool_call_id": "string",
"tool_calls": [
{
"id": "string",
"function": {
"arguments": "string",
"name": "string"
},
"type": "string"
}
]
}
],
"model": "string",
"max_completion_tokens": 42.0,
"stop": [
"string"
],
"temperature": 42.0,
"": "string",
"tools": [
{
"type": "string",
"function": {
"description": "string",
"name": "string",
"parameters": {},
"strict": true
}
}
],
"top_p": 42.0
}
Response examples (200)
{}