Search and Query DSL changes

edit

Changes to queries

edit
  • The collect_payloads parameter of the span_near query has been removed. Payloads will be loaded when needed.
  • Queries on boolean fields now strictly parse boolean-like values. This means only the strings "true" and "false" will be parsed into their boolean counterparts. Other strings will cause an error to be thrown.
  • The in query (a synonym for the terms query) has been removed
  • The geo_bbox query (a synonym for the geo_bounding_box query) has been removed
  • The mlt query (a synonym for the more_like_this query) has been removed.
  • The deprecated like_text, ids and docs parameters (all synonyms for like) of the more_like_this query have been removed. Also the deprecated min_word_len (a synonym for min_word_length) and max_word_len (a synonym for max_word_length) have been removed.
  • The fuzzy_match and match_fuzzy query (synonyma for the match query) have been removed
  • The terms query now always returns scores equal to 1 and is not subject to indices.query.bool.max_clause_count anymore.
  • The deprecated indices query has been removed.
  • Support for empty query objects ({ }) has been removed from the query DSL. An error is thrown whenever an empty query object is provided.
  • The deprecated minimum_number_should_match parameter in the bool query has been removed, use minimum_should_match instead.
  • The query_string query now correctly parses the maximum number of states allowed when determinizing a regex as max_determinized_states instead of the typo max_determined_states.
  • The query_string query no longer accepts enable_position_increment, use enable_position_increments instead.
  • For geo_distance queries, sorting, and aggregations the sloppy_arc option has been removed from the distance_type parameter.
  • The geo_distance_range query, which was deprecated in 5.0, has been removed.
  • The optimize_bbox parameter has been removed from geo_distance queries.
  • The ignore_malformed and coerce parameters have been removed from geo_bounding_box, geo_polygon, and geo_distance queries.
  • The disable_coord parameter of the bool and common_terms queries has been removed. If provided, it will be ignored and issue a deprecation warning.
  • The template query has been removed. This query was deprecated since 5.0
  • The percolate query’s document_type has been deprecated. From 6.0 and later it is no longer required to specify the document_type parameter.
  • The split_on_whitespace parameter for the query_string query has been removed. If provided, it will be ignored and issue a deprecation warning. The query_string query now splits on operator only.
  • The use_dis_max parameter for the query_string query has been removed. If provided, it will be ignored and issue a deprecation warning. The tie_breaker parameter must be used instead.
  • The auto_generate_phrase_queries parameter for the query_string query has been removed, use an explicit quoted query instead. If provided, it will be ignored and issue a deprecation warning.
  • The all_fields parameter for the query_string and simple_query_string has been removed. Set default_field to *` instead. If provided, default_field will be automatically set to *
  • The index parameter in the terms filter, used to look up terms in a dedicated index is now mandatory. Previously, the index defaulted to the index the query was executed on. Now this index must be explicitly set in the request.
  • The deprecated type and slop parameter for the match query have been removed. Instead of setting the type, the match_phrase or match_phrase_prefix should be used. The slop removed from the match query but is supported for match_phrase and match_phrase_prefix.
  • The deprecated phrase_slop parameter (a synonym for the slop parameter) of the match_phrase query has been removed.
  • The deprecated query parameter (a synonym for the filter parameter) of the constant_score query has been removed.
  • The deprecated phrase_slop parameter (a synonym for the slop parameter) of the multi_match query has been removed.
  • The deprecated prefix parameter (a synonym for the value parameter) of the prefix query has been removed.
  • The deprecated le (a synonym for lte) and ge (a synonym for gte) parameter of the range query have been removed.
  • The deprecated types and _type synonyms for the type parameter of the ids query have been removed
  • The deprecated multi term rewrite parameters constant_score_auto, constant_score_filter (synonyms for constant_score) have been removed.

Search shards API

edit

The search shards API no longer accepts the type url parameter, which didn’t have any effect in previous versions.

Changes to the Profile API

edit

The "time" field showing human readable timing output has been replaced by the "time_in_nanos" field which displays the elapsed time in nanoseconds. The "time" field can be turned on by adding "?human=true" to the request url. It will display a rounded, human readable time value.

Scoring changes

edit

Query normalization

edit

Query normalization has been removed. This means that the TF-IDF similarity no longer tries to make scores comparable across queries and that boosts are now integrated into scores as simple multiplicative factors.

Other similarities are not affected as they did not normalize scores and already integrated boosts into scores as multiplicative factors.

See LUCENE-7347 for more information.

Coordination factors

edit

Coordination factors have been removed from the scoring formula. This means that boolean queries no longer score based on the number of matching clauses. Instead, they always return the sum of the scores of the matching clauses.

As a consequence, use of the TF-IDF similarity is now discouraged as this was an important component of the quality of the scores that this similarity produces. BM25 is recommended instead.

See LUCENE-7347 for more information.

Fielddata on _uid

edit

Fielddata on _uid is deprecated. It is possible to switch to _id instead but the only reason why it has not been deprecated too is because it is used for the random_score function. If you really need access to the id of documents for sorting, aggregations or search scripts, the recommendation is to duplicate the id as a field in the document.

Highlighters

edit

The unified highlighter is the new default choice for highlighter. The offset strategy for each field is picked internally by this highlighter depending on the type of the field (index_options). It is still possible to force the highlighter to fvh or plain types.

The postings highlighter has been removed from Lucene and Elasticsearch. The unified highlighter outputs the same highlighting when index_options is set to offsets.

fielddata_fields

edit

The deprecated fielddata_fields have now been removed. docvalue_fields should be used instead.

docvalue_fields

edit

docvalue_fields now have a default upper limit of 100 fields that can be requested. This limit can be overridden by using the index.max_docvalue_fields_search index setting.

script_fields

edit

script_fields now have a default upper limit of 32 script fields that can be requested. This limit can be overridden by using the index.max_script_fields index setting.

Inner hits

edit

The source inside a hit of inner hits keeps its full path with respect to the entire source. In prior versions the source field names were relative to the inner hit.

Scroll

edit

The from parameter can no longer be used in the search request body when initiating a scroll. The parameter was already ignored in these situations, now in addition an error is thrown.

Limit on from/size in top hits and inner hits

edit

The maximum number of results (from + size) that is allowed to be retrieved via inner hits and top hits has been limited to 100. The limit can be controlled via the index.max_inner_result_window index setting.

Scroll queries that use the request_cache are deprecated

edit

Setting request_cache:true on a query that creates a scroll ('scroll=1m`) is deprecated and the request will not use the cache internally. In future versions we will return a 400 - Bad request instead of just ignoring the hint. Scroll queries are not meant to be cached.

Limiting the number of terms that can be used in a Terms Query request

edit

Executing a Terms Query with a lot of terms may degrade the cluster performance, as each additional term demands extra processing and memory. To safeguard against this, the maximum number of terms that can be used in a Terms Query request has been limited to 65536. This default maximum can be changed for a particular index with the index setting index.max_terms_count.