WARNING: Version 1.3 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
More Like This Query
editMore Like This Query
editMore like this query find documents that are "like" provided text by running it against one or more fields.
{ "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }
Added in 1.3.0.
The ability to run the mlt
query on multiple docs is only available from 1.3.0 onwards
Additionally, More Like This can find documents that are "like" a set of
chosen documents. The syntax to specify one or more documents is similar to
the Multi GET API, and supports the ids
or docs
array.
If only one document is specified, the query behaves the same as the
More Like This API.
{ "more_like_this" : { "fields" : ["name.first", "name.last"], "docs" : [ { "_index" : "test", "_type" : "type", "_id" : "1" }, { "_index" : "test", "_type" : "type", "_id" : "2" } ], "ids" : ["3", "4"], "min_term_freq" : 1, "max_query_terms" : 12 } }
more_like_this
can be shortened to mlt
.
Under the hood, more_like_this
simply creates multiple should
clauses in a bool
query of
interesting terms extracted from some provided text. The interesting terms are
selected with respect to their tf-idf scores. These are controlled by
min_term_freq
, min_doc_freq
, and max_doc_freq
. The number of interesting
terms is controlled by max_query_terms
. While the minimum number of clauses
that must be satisfied is controlled by percent_terms_to_match
. The terms
are extracted from like_text
which is analyzed by the analyzer associated
with the field, unless specified by analyzer
. There are other parameters,
such as min_word_length
, max_word_length
or stop_words
, to control what
terms should be considered as interesting. In order to give more weight to
more interesting terms, each boolean clause associated with a term could be
boosted by the term tf-idf score times some boosting factor boost_terms
.
When a search for multiple docs
is issued, More Like This generates a
more_like_this
query per document field in fields
. These fields
are
specified as a top level parameter or within each doc
.
The more_like_this
top level parameters include:
Parameter | Description |
---|---|
|
A list of the fields to run the more like this query against.
Defaults to the |
|
The text to find documents like it, required if |
|
[1.3.0]
Added in 1.3.0.
A list of documents following the same syntax as the
Multi GET API. This parameter is required if
|
|
[1.3.0]
Added in 1.3.0.
When using |
|
[1.3.0]
Deprecated in 1.3.0. Replaced by |
|
The percentage of terms to match on (float
value). Defaults to |
|
The frequency below which terms will be ignored in the
source doc. The default frequency is |
|
The maximum number of query terms that will be
included in any generated query. Defaults to |
|
An array of stop words. Any word in this set is considered "uninteresting" and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting". |
|
The frequency at which words will be ignored which do
not occur in at least this many docs. Defaults to |
|
The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded. |
|
The minimum word length below which words will be
ignored. Defaults to |
|
The maximum word length above which words will be
ignored. Defaults to unbounded ( |
|
Sets the boost factor to use when boosting terms.
Defaults to deactivated ( |
|
Sets the boost value of the query. Defaults to |
|
The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field. |