JinaAI inference service
editJinaAI inference service
editCreates an inference endpoint to perform an inference task with the jinaai
service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id>
- (Required, string) The unique identifier of the inference endpoint.
-
<task_type>
-
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
text_embedding
, -
rerank
.
-
Request body
edit-
chunking_settings
-
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunking_size
-
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250
. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy). -
overlap
-
(Optional, integer)
Only for
word
chunking strategy. Specifies the number of overlapping words for chunks. Defaults to100
. This value cannot be higher than the half ofmax_chunking_size
. -
sentence_overlap
-
(Optional, integer)
Only for
sentence
chunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1
or0
. Defaults to1
. -
strategy
-
(Optional, string)
Specifies the chunking strategy.
It could be either
sentence
orword
.
-
-
service
-
(Required, string)
The type of service supported for the specified task type. In this case,
jinaai
. -
service_settings
-
(Required, object) Settings used to install the inference model.
These settings are specific to the
jinaai
service.-
api_key
-
(Required, string) A valid API key for your JinaAI account. You can find it at https://jina.ai/embeddings/.
You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
-
rate_limit
-
(Optional, object) The default rate limit for the
jinaai
service is 2000 requests per minute for all task types. You can modify this using therequests_per_minute
setting in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }
More information about JinaAI’s rate limits can be found in https://jina.ai/contact-sales/#rate-limit.
service_settings
for thererank
task type-
model_id
-
(Required, string)
The name of the model to use for the inference task.
To review the available
rerank
compatible models, refer to https://jina.ai/reranker.
service_settings
for thetext_embedding
task type-
model_id
-
(Optional, string)
The name of the model to use for the inference task.
To review the available
text_embedding
models, refer to the https://jina.ai/embeddings/. -
similarity
-
(Optional, string)
Similarity measure. One of
cosine
,dot_product
,l2_norm
. Defaults based on theembedding_type
(float
→dot_product
,int8/byte
→cosine
).
-
-
-
task_settings
-
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>
you specified.task_settings
for thererank
task type-
return_documents
- (Optional, boolean) Specify whether to return doc text within the results.
-
top_n
-
(Optional, integer)
The number of most relevant documents to return, defaults to the number of the documents.
If this inference endpoint is used in a
text_similarity_reranker
retriever query andtop_n
is set, it must be greater than or equal torank_window_size
in the query.
task_settings
for thetext_embedding
task type-
task
-
(Optional, string) Specifies the task passed to the model. Valid values are:
-
classification
: use it for embeddings passed through a text classifier. -
clustering
: use it for the embeddings run through a clustering algorithm. -
ingest
: use it for storing document embeddings in a vector database. -
search
: use it for storing embeddings of search queries run against a vector database to find relevant documents.
-
-
JinaAI service examples
editThe following examples demonstrate how to create inference endpoints for text_embeddings
and rerank
tasks using the JinaAI service and use them in search requests.
First, we create the embeddings
service:
PUT _inference/text_embedding/jinaai-embeddings { "service": "jinaai", "service_settings": { "model_id": "jina-embeddings-v3", "api_key": "<api_key>" } }
Then, we create the rerank
service:
PUT _inference/rerank/jinaai-rerank { "service": "jinaai", "service_settings": { "api_key": "<api_key>", "model_id": "jina-reranker-v2-base-multilingual" }, "task_settings": { "top_n": 10, "return_documents": true } }
Now we can create an index that will use jinaai-embeddings
service to index the documents.
PUT jinaai-index { "mappings": { "properties": { "content": { "type": "semantic_text", "inference_id": "jinaai-embeddings" } } } }
PUT jinaai-index/_bulk { "index" : { "_index" : "jinaai-index", "_id" : "1" } } {"content": "Sarah Johnson is a talented marine biologist working at the Oceanographic Institute. Her groundbreaking research on coral reef ecosystems has garnered international attention and numerous accolades."} { "index" : { "_index" : "jinaai-index", "_id" : "2" } } {"content": "She spends months at a time diving in remote locations, meticulously documenting the intricate relationships between various marine species. "} { "index" : { "_index" : "jinaai-index", "_id" : "3" } } {"content": "Her dedication to preserving these delicate underwater environments has inspired a new generation of conservationists."}
Now, with the index created, we can search with and without the reranker service.
GET jinaai-index/_search { "query": { "semantic": { "field": "content", "query": "who inspired taking care of the sea?" } } }
POST jinaai-index/_search { "retriever": { "text_similarity_reranker": { "retriever": { "standard": { "query": { "semantic": { "field": "content", "query": "who inspired taking care of the sea?" } } } }, "field": "content", "rank_window_size": 100, "inference_id": "jinaai-rerank", "inference_text": "who inspired taking care of the sea?" } } }