Rank features field type

edit

A rank_features field can index numeric feature vectors, so that they can later be used to boost documents in queries with a rank_feature query.

It is analogous to the rank_feature data type but is better suited when the list of features is sparse so that it wouldn’t be reasonable to add one field to the mappings for each of them.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "topics": {
        "type": "rank_features" 
      },
      "negative_reviews" : {
        "type": "rank_features",
        "positive_score_impact": false 
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "topics": { 
    "politics": 20,
    "economics": 50.8
  },
  "negative_reviews": {
    "1star": 10,
    "2star": 100
  }
}

PUT my-index-000001/_doc/2
{
  "topics": {
    "politics": 5.2,
    "sports": 80.1
  },
  "negative_reviews": {
    "1star": 1,
    "2star": 10
  }
}

GET my-index-000001/_search
{
  "query": { 
    "rank_feature": {
      "field": "topics.politics"
    }
  }
}

GET my-index-000001/_search
{
  "query": { 
    "rank_feature": {
      "field": "negative_reviews.1star"
    }
  }
}

GET my-index-000001/_search
{
  "query": { 
    "term": {
      "topics": "economics"
    }
  }
}

Rank features fields must use the rank_features field type

Rank features that correlate negatively with the score need to declare it

Rank features fields must be a hash with string keys and strictly positive numeric values

This query ranks documents by how much they are about the "politics" topic.

This query ranks documents inversely to the number of "1star" reviews they received.

This query returns documents that store the "economics" feature in the "topics" field.

rank_features fields only support single-valued features and strictly positive values. Multi-valued fields and zero or negative values will be rejected.

rank_features fields do not support sorting or aggregating and may only be queried using rank_feature or term queries.

term queries on rank_features fields are scored by multiplying the matched stored feature value by the provided boost.

rank_features fields only preserve 9 significant bits for the precision, which translates to a relative error of about 0.4%.

Rank features that correlate negatively with the score should set positive_score_impact to false (defaults to true). This will be used by the rank_feature query to modify the scoring formula in such a way that the score decreases with the value of the feature instead of increasing.