Retriever
editRetriever
editThis functionality is in technical preview and may be changed or removed in a future release. The syntax will likely change before GA. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
A retriever is a specification to describe top documents returned from a
search. A retriever replaces other elements of the search API
that also return top documents such as query
and
knn
. A retriever may have child retrievers where a
retriever with two or more children is considered a compound retriever. This
allows for complex behavior to be depicted in a tree-like structure, called
the retriever tree, to better clarify the order of operations that occur
during a search.
Refer to Retrievers for a high level overview of the retrievers abstraction.
The following retrievers are available:
-
standard
- A retriever that replaces the functionality of a traditional query.
-
knn
- A retriever that replaces the functionality of a knn search.
-
rrf
- A retriever that produces top documents from reciprocal rank fusion (RRF).
Standard Retriever
editA standard retriever returns top documents from a traditional query.
Parameters:
edit-
query
-
(Optional, query object)
Defines a query to retrieve a set of top documents.
-
filter
-
(Optional, query object or list of query objects)
Applies a boolean query filter to this retriever where all documents must match this query but do not contribute to the score.
-
search_after
-
(Optional, search after object)
Defines a search after object parameter used for pagination.
-
terminate_after
-
(Optional, integer) Maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.
-
sort
-
(Optional, sort object) A sort object that that specifies the order of matching documents.
-
min_score
-
(Optional,
float
)Minimum
_score
for matching documents. Documents with a lower_score
are not included in the top documents. -
collapse
-
(Optional, collapse object)
Collapses the top documents by a specified key into a single top document per key.
Restrictions
editWhen a retriever tree contains a compound retriever (a retriever with two or more child retrievers) only the query element is allowed.
Example
editGET /index/_search { "retriever": { "standard": { "query" { ... }, "filter" { ... }, "min_score": ... } }, "size": ... }
kNN Retriever
editA kNN retriever returns top documents from a k-nearest neighbor search (kNN).
Parameters
edit-
field
-
(Required, string)
The name of the vector field to search against. Must be a
dense_vector
field with indexing enabled. -
query_vector
-
(Required if
query_vector_builder
is not defined, array offloat
)Query vector. Must have the same number of dimensions as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.
-
query_vector_builder
-
(Required if
query_vector
is not defined, query vector builder object)Defines a model to build a query vector.
-
k
-
(Required, integer)
Number of nearest neighbors to return as top hits. This value must be fewer than or equal to
num_candidates
. -
num_candidates
-
(Required, integer)
The number of nearest neighbor candidates to consider per shard. Needs to be greater than
k
, orsize
ifk
is omitted, and cannot exceed 10,000. Elasticsearch collectsnum_candidates
results from each shard, then merges them to find the topk
results. Increasingnum_candidates
tends to improve the accuracy of the finalk
results. Defaults toMath.min(1.5 * k, 10_000)
. -
filter
-
(Optional, query object or list of query objects)
Query to filter the documents that can match. The kNN search will return the top
k
documents that also match this filter. The value can be a single query or a list of queries. Iffilter
is not provided, all documents are allowed to match. -
similarity
-
(Optional, float)
The minimum similarity required for a document to be considered a match. The similarity value calculated relates to the raw
similarity
used. Not the document score. The matched documents are then scored according tosimilarity
and the providedboost
is applied.The
similarity
parameter is the direct vector similarity calculation.-
l2_norm
: also known as Euclidean, will include documents where the vector is within thedims
dimensional hypersphere with radiussimilarity
with origin atquery_vector
. -
cosine
,dot_product
, andmax_inner_product
: Only return vectors where the cosine similarity or dot-product are at least the providedsimilarity
.
Read more here: knn similarity search
-
Restrictions
editThe parameters query_vector
and query_vector_builder
cannot be used together.
Example:
editGET /index/_search { "retriever": { "knn": { "field": ..., "query_vector": ..., "k": ..., "num_candidates": ... } } }
RRF Retriever
editAn RRF retriever returns top documents based on the RRF formula equally weighting two or more child retrievers.
Parameters
edit-
retrievers
-
(Required, array of retriever objects)
A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them. Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required.
-
rank_constant
-
(Optional, integer)
This value determines how much influence documents in individual result sets per query have over the final ranked result set. A higher value indicates that lower ranked documents have more influence. This value must be greater than or equal to
1
. Defaults to60
. -
window_size
-
(Optional, integer)
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s size.
window_size
must be greater than or equal tosize
and greater than or equal to1
. Defaults to thesize
parameter.
Restrictions
editAn RRF retriever is a compound retriever. Child retrievers may not use elements that are restricted by having a compound retriever as part of the retriever tree.
Example
editGET /index/_search { "retriever": { "rrf": { "retrievers": [ { "standard" { ... } }, { "knn": { ... } } ], "rank_constant": ... "window_size": ... } } }
Using from
and size
with a retriever tree
editThe from
and size
parameters are provided globally as part of the general
search API. They are applied to all retrievers in a
retriever tree unless a specific retriever overrides the size
parameter
using a different parameter such as window_size
. Though, the final
search hits are always limited to size
.
Using aggregations with a retriever tree
editAggregations are globally specified as part of a search request.
The query used for an aggregation is the combination of all leaf retrievers as should
clauses in a boolean query.
Restrictions on search parameters when specifying a retriever
editWhen a retriever is specified as part of a search the following elements are not allowed at the top-level and instead are only allowed as elements of specific retrievers: