Create a VoyageAI inference endpoint Added in 8.19.0

PUT /_inference/{task_type}/{voyageai_inference_id}

Create an inference endpoint to perform an inference task with the voyageai service.

Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.

Path parameters

  • task_type string Required

    The type of the inference task that the model will perform.

    Values are text_embedding or rerank.

  • voyageai_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • Hide chunking_settings attributes Show chunking_settings attributes object
    • service string Required

      The service type

    • service_settings object Required
    • The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is voyageai.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • The number of dimensions for resulting output embeddings. This setting maps to output_dimension in the VoyageAI documentation. Only for the text_embedding task type.

    • model_id string Required

      The name of the model to use for the inference task. Refer to the VoyageAI documentation for the list of available text embedding and rerank models.

    • Hide rate_limit attribute Show rate_limit attribute object
    • The data type for the embeddings to be returned. This setting maps to output_dtype in the VoyageAI documentation. Permitted values: float, int8, bit. int8 is a synonym of byte in the VoyageAI documentation. bit is a synonym of binary in the VoyageAI documentation. Only for the text_embedding task type.

  • Hide task_settings attributes Show task_settings attributes object
    • Type of the input text. Permitted values: ingest (maps to document in the VoyageAI documentation), search (maps to query in the VoyageAI documentation). Only for the text_embedding task type.

    • Whether to return the source documents in the response. Only for the rerank task type.

    • top_k number

      The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned. Only for the rerank task type.

    • truncation boolean

      Whether to truncate the input texts to fit within the context length.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • Hide attributes Show attributes object
      • The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

PUT /_inference/{task_type}/{voyageai_inference_id}
curl \
 --request PUT 'http://api.example.com/_inference/{task_type}/{voyageai_inference_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n    \"service\": \"voyageai\",\n    \"service_settings\": {\n        \"model_id\": \"voyage-3-large\",\n        \"dimensions\": 512\n    }\n}"'
Request examples
Run `PUT _inference/text_embedding/voyageai-embeddings` to create an inference endpoint that performs a `text_embedding` task. The embeddings created by requests to this endpoint will have 512 dimensions.
{
    "service": "voyageai",
    "service_settings": {
        "model_id": "voyage-3-large",
        "dimensions": 512
    }
}
Run `PUT _inference/rerank/voyageai-rerank` to create an inference endpoint that performs a `rerank` task.
{
    "service": "voyageai",
    "service_settings": {
        "model_id": "rerank-2"
    }
}