Rank feature query
editRank feature query
editBoosts the relevance score of documents based on the
numeric value of a rank_feature
or
rank_features
field.
The rank_feature
query is typically used in the should
clause of a
bool
query so its relevance scores are added to other
scores from the bool
query.
With positive_score_impact
set to false
for a rank_feature
or
rank_features
field, we recommend that every document that participates
in a query has a value for this field. Otherwise, if a rank_feature
query
is used in the should clause, it doesn’t add anything to a score of
a document with a missing value, but adds some boost for a document
containing a feature. This is contrary to what we want – as we consider these
features negative, we want to rank documents containing them lower than documents
missing them.
Unlike the function_score
query or other
ways to change relevance scores, the
rank_feature
query efficiently skips non-competitive hits when the
track_total_hits
parameter is not true
. This can
dramatically improve query speed.
Rank feature functions
editTo calculate relevance scores based on rank feature fields, the rank_feature
query supports the following mathematical functions:
If you don’t know where to start, we recommend using the saturation
function.
If no function is provided, the rank_feature
query uses the saturation
function by default.
Example request
editIndex setup
editTo use the rank_feature
query, your index must include a
rank_feature
or rank_features
field
mapping. To see how you can set up an index for the rank_feature
query, try
the following example.
Create a test
index with the following field mappings:
-
pagerank
, arank_feature
field which measures the importance of a website -
url_length
, arank_feature
field which contains the length of the website’s URL. For this example, a long URL correlates negatively to relevance, indicated by apositive_score_impact
value offalse
. -
topics
, arank_features
field which contains a list of topics and a measure of how well each document is connected to this topic
PUT /test { "mappings": { "properties": { "pagerank": { "type": "rank_feature" }, "url_length": { "type": "rank_feature", "positive_score_impact": false }, "topics": { "type": "rank_features" } } } }
Index several documents to the test
index.
PUT /test/_doc/1?refresh { "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics", "content": "Rio 2016", "pagerank": 50.3, "url_length": 42, "topics": { "sports": 50, "brazil": 30 } } PUT /test/_doc/2?refresh { "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix", "content": "Formula One motor race held on 13 November 2016", "pagerank": 50.3, "url_length": 47, "topics": { "sports": 35, "formula one": 65, "brazil": 20 } } PUT /test/_doc/3?refresh { "url": "https://en.wikipedia.org/wiki/Deadpool_(film)", "content": "Deadpool is a 2016 American superhero film", "pagerank": 50.3, "url_length": 37, "topics": { "movies": 60, "super hero": 65 } }
Example query
editThe following query searches for 2016
and boosts relevance scores based on
pagerank
, url_length
, and the sports
topic.
GET /test/_search { "query": { "bool": { "must": [ { "match": { "content": "2016" } } ], "should": [ { "rank_feature": { "field": "pagerank" } }, { "rank_feature": { "field": "url_length", "boost": 0.1 } }, { "rank_feature": { "field": "topics.sports", "boost": 0.4 } } ] } } }
Top-level parameters for rank_feature
edit-
field
-
(Required, string)
rank_feature
orrank_features
field used to boost relevance scores. -
boost
-
(Optional, float) Floating point number used to decrease or increase relevance scores. Defaults to
1.0
.Boost values are relative to the default value of
1.0
. A boost value between0
and1.0
decreases the relevance score. A value greater than1.0
increases the relevance score. -
saturation
-
(Optional, function object) Saturation function used to boost relevance scores based on the value of the rank feature
field
. If no function is provided, therank_feature
query defaults to thesaturation
function. See Saturation for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. -
log
-
(Optional, function object) Logarithmic function used to boost relevance scores based on the value of the rank feature
field
. See Logarithm for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. -
sigmoid
-
(Optional, function object) Sigmoid function used to boost relevance scores based on the value of the rank feature
field
. See Sigmoid for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided. -
linear
-
(Optional, function object) Linear function used to boost relevance scores based on the value of the rank feature
field
. See Linear for more information.Only one function
saturation
,log
,sigmoid
orlinear
can be provided.
Notes
editSaturation
editThe saturation
function gives a score equal to S / (S + pivot)
, where S
is
the value of the rank feature field and pivot
is a configurable pivot value so
that the result will be less than 0.5
if S
is less than pivot and greater
than 0.5
otherwise. Scores are always (0,1)
.
If the rank feature has a negative score impact then the function will be
computed as pivot / (S + pivot)
, which decreases when S
increases.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": { "pivot": 8 } } } }
If a pivot
value is not provided, Elasticsearch computes a default value equal to the
approximate geometric mean of all rank feature values in the index. We recommend
using this default value if you haven’t had the opportunity to train a good
pivot value.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "saturation": {} } } }
Logarithm
editThe log
function gives a score equal to log(scaling_factor + S)
, where S
is the value of the rank feature field and scaling_factor
is a configurable
scaling factor. Scores are unbounded.
This function only supports rank features that have a positive score impact.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "log": { "scaling_factor": 4 } } } }
Sigmoid
editThe sigmoid
function is an extension of saturation
which adds a configurable
exponent. Scores are computed as S^exp^ / (S^exp^ + pivot^exp^)
. Like for the
saturation
function, pivot
is the value of S
that gives a score of 0.5
and scores are (0,1)
.
The exponent
must be positive and is typically in [0.5, 1]
. A
good value should be computed via training. If you don’t have the opportunity to
do so, we recommend you use the saturation
function instead.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "sigmoid": { "pivot": 7, "exponent": 0.6 } } } }
Linear
editThe linear
function is the simplest function, and gives a score equal
to the indexed value of S
, where S
is the value of the rank feature
field.
If a rank feature field is indexed with "positive_score_impact": true
,
its indexed value is equal to S
and rounded to preserve only
9 significant bits for the precision.
If a rank feature field is indexed with "positive_score_impact": false
,
its indexed value is equal to 1/S
and rounded to preserve only 9 significant
bits for the precision.
GET /test/_search { "query": { "rank_feature": { "field": "pagerank", "linear": {} } } }