Combined fields
editCombined fields
editThe combined_fields
query supports searching multiple text fields as if their
contents had been indexed into one combined field. The query takes a term-centric
view of the input string: first it analyzes the query string into individual terms,
then looks for each term in any of the fields. This query is particularly
useful when a match could span multiple text fields, for example the title
,
abstract
, and body
of an article:
response = client.search( body: { query: { combined_fields: { query: 'database systems', fields: [ 'title', 'abstract', 'body' ], operator: 'and' } } } ) puts response
GET /_search { "query": { "combined_fields" : { "query": "database systems", "fields": [ "title", "abstract", "body"], "operator": "and" } } }
The combined_fields
query takes a principled approach to scoring based on the
simple BM25F formula described in
The Probabilistic Relevance Framework: BM25 and Beyond.
When scoring matches, the query combines term and collection statistics across
fields to score each match as if the specified fields had been indexed into a
single, combined field. This scoring is a best attempt; combined_fields
makes
some approximations and scores will not obey the BM25F model perfectly.
Field number limit
By default, there is a limit to the number of clauses a query can contain. This
limit is defined by the
indices.query.bool.max_clause_count
setting, which defaults to 4096
. For combined fields queries, the number of
clauses is calculated as the number of fields multiplied by the number of terms.
Per-field boosting
editField boosts are interpreted according to the combined field model. For example,
if the title
field has a boost of 2, the score is calculated as if each term
in the title appeared twice in the synthetic combined field.
response = client.search( body: { query: { combined_fields: { query: 'distributed consensus', fields: [ 'title^2', 'body' ] } } } ) puts response
GET /_search { "query": { "combined_fields" : { "query" : "distributed consensus", "fields" : [ "title^2", "body" ] } } }
The combined_fields
query requires that field boosts are greater than
or equal to 1.0. Field boosts are allowed to be fractional.
Top-level parameters for combined_fields
edit-
fields
-
(Required, array of strings) List of fields to search. Field wildcard patterns
are allowed. Only
text
fields are supported, and they must all have the same searchanalyzer
. -
query
-
(Required, string) Text to search for in the provided
<fields>
.The
combined_fields
query analyzes the provided text before performing a search. -
auto_generate_synonyms_phrase_query
-
(Optional, Boolean) If
true
, match phrase queries are automatically created for multi-term synonyms. Defaults totrue
.See Use synonyms with match query for an example.
-
operator
-
(Optional, string) Boolean logic used to interpret text in the
query
value. Valid values are:-
or
(Default) -
For example, a
query
value ofdatabase systems
is interpreted asdatabase OR systems
. -
and
-
For example, a
query
value ofdatabase systems
is interpreted asdatabase AND systems
.
-
-
minimum_should_match
-
(Optional, string) Minimum number of clauses that must match for a document to be returned. See the
minimum_should_match
parameter for valid values and more information. -
zero_terms_query
-
(Optional, string) Indicates whether no documents are returned if the
analyzer
removes all tokens, such as when using astop
filter. Valid values are:-
none
(Default) -
No documents are returned if the
analyzer
removes all tokens. -
all
-
Returns all documents, similar to a
match_all
query.
See Zero terms query for an example.
-
Comparison to multi_match
query
editThe combined_fields
query provides a principled way of matching and scoring
across multiple text
fields. To support this, it requires that all
fields have the same search analyzer
.
If you want a single query that handles fields of different types like
keywords or numbers, then the multi_match
query may be a better fit. It supports both text and non-text fields, and
accepts text fields that do not share the same analyzer.
The main multi_match
modes best_fields
and most_fields
take a
field-centric view of the query. In contrast, combined_fields
is
term-centric: operator
and minimum_should_match
are applied per-term,
instead of per-field. Concretely, a query like
response = client.search( body: { query: { combined_fields: { query: 'database systems', fields: [ 'title', 'abstract' ], operator: 'and' } } } ) puts response
GET /_search { "query": { "combined_fields" : { "query": "database systems", "fields": [ "title", "abstract"], "operator": "and" } } }
is executed as:
+(combined("database", fields:["title" "abstract"])) +(combined("systems", fields:["title", "abstract"]))
In other words, each term must be present in at least one field for a document to match.
The cross_fields
multi_match
mode also takes a term-centric approach and
applies operator
and minimum_should_match per-term
. The main advantage of
combined_fields
over cross_fields
is its robust and interpretable approach
to scoring based on the BM25F algorithm.
Custom similarities
The combined_fields
query currently only supports the BM25 similarity,
which is the default unless a custom similarity
is configured. Per-field similarities are also not allowed.
Using combined_fields
in either of these cases will result in an error.