WARNING: The 1.x versions of Elasticsearch have passed their EOL dates. If you are running a 1.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Cross-fields Entity Search
editCross-fields Entity Search
editNow we come to a common pattern: cross-fields entity search. With entities
like person
, product
, or address
, the identifying information is spread
across several fields. We may have a person
indexed as follows:
{ "firstname": "Peter", "lastname": "Smith" }
Or an address like this:
{ "street": "5 Poland Street", "city": "London", "country": "United Kingdom", "postcode": "W1V 3DG" }
This sounds a lot like the example we described in Multiple Query Strings, but there is a big difference between these two scenarios. In Multiple Query Strings, we used a separate query string for each field. In this scenario, we want to search across multiple fields with a single query string.
Our user might search for the person “Peter Smith” or for the address
“Poland Street W1V.” Each of those words appears in a different field, so
using a dis_max
/ best_fields
query to find the single best-matching
field is clearly the wrong approach.
A Naive Approach
editReally, we want to query each field in turn and add up the scores of every
field that matches, which sounds like a job for the bool
query:
{ "query": { "bool": { "should": [ { "match": { "street": "Poland Street W1V" }}, { "match": { "city": "Poland Street W1V" }}, { "match": { "country": "Poland Street W1V" }}, { "match": { "postcode": "Poland Street W1V" }} ] } } }
Repeating the query string for every field soon becomes tedious. We can use
the multi_match
query instead, and set the type
to most_fields
to tell it to
combine the scores of all matching fields:
{ "query": { "multi_match": { "query": "Poland Street W1V", "type": "most_fields", "fields": [ "street", "city", "country", "postcode" ] } } }
Problems with the most_fields Approach
editThe most_fields
approach to entity search has some problems that are not
immediately obvious:
- It is designed to find the most fields matching any words, rather than to find the most matching words across all fields.
-
It can’t use the
operator
orminimum_should_match
parameters to reduce the long tail of less-relevant results. - Term frequencies are different in each field and could interfere with each other to produce badly ordered results.