- Elasticsearch Guide: other versions:
- Getting Started
- Setup
- Breaking changes
- API Conventions
- Document APIs
- Search APIs
- Search
- URI Search
- Request Body Search
- Search Template
- Search Shards API
- Aggregations
- Min Aggregation
- Max Aggregation
- Sum Aggregation
- Avg Aggregation
- Stats Aggregation
- Extended Stats Aggregation
- Value Count Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Cardinality Aggregation
- Geo Bounds Aggregation
- Top hits Aggregation
- Scripted Metric Aggregation
- Global Aggregation
- Filter Aggregation
- Filters Aggregation
- Missing Aggregation
- Nested Aggregation
- Reverse nested Aggregation
- Children Aggregation
- Terms Aggregation
- Significant Terms Aggregation
- Range Aggregation
- Date Range Aggregation
- IPv4 Range Aggregation
- Histogram Aggregation
- Date Histogram Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Facets
- Suggesters
- Multi Search API
- Count API
- Search Exists API
- Validate API
- Explain API
- Percolator
- More Like This API
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Delete Mapping
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Warmers
- Status
- Indices Stats
- Indices Segments
- Indices Recovery
- Clear Cache
- Flush
- Refresh
- Optimize
- Upgrade
- cat APIs
- Cluster APIs
- Query DSL
- Queries
- Match Query
- Multi Match Query
- Bool Query
- Boosting Query
- Common Terms Query
- Constant Score Query
- Dis Max Query
- Filtered Query
- Fuzzy Like This Query
- Fuzzy Like This Field Query
- Function Score Query
- Fuzzy Query
- GeoShape Query
- Has Child Query
- Has Parent Query
- Ids Query
- Indices Query
- Match All Query
- More Like This Query
- More Like This Field Query
- Nested Query
- Prefix Query
- Query String Query
- Simple Query String Query
- Range Query
- Regexp Query
- Span First Query
- Span Multi Term Query
- Span Near Query
- Span Not Query
- Span Or Query
- Span Term Query
- Term Query
- Terms Query
- Top Children Query
- Wildcard Query
- Minimum Should Match
- Multi Term Query Rewrite
- Template Query
- Filters
- And Filter
- Bool Filter
- Exists Filter
- Geo Bounding Box Filter
- Geo Distance Filter
- Geo Distance Range Filter
- Geo Polygon Filter
- GeoShape Filter
- Geohash Cell Filter
- Has Child Filter
- Has Parent Filter
- Ids Filter
- Indices Filter
- Limit Filter
- Match All Filter
- Missing Filter
- Nested Filter
- Not Filter
- Or Filter
- Prefix Filter
- Query Filter
- Range Filter
- Regexp Filter
- Script Filter
- Term Filter
- Terms Filter
- Type Filter
- Queries
- Mapping
- Analysis
- Analyzers
- Tokenizers
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Compound Word Token Filter
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Classic Token Filter
- Apostrophe Token Filter
- Character Filters
- ICU Analysis Plugin
- Modules
- Index Modules
- Testing
- Glossary of terms
WARNING: Version 1.4 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
More Like This Query
editMore Like This Query
editMore like this query find documents that are "like" provided text by running it against one or more fields.
{ "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }
Additionally, More Like This can find documents that are "like" a set of
chosen documents. The syntax to specify one or more documents is similar to
the Multi GET API, and supports the ids
or docs
array.
If only one document is specified, the query behaves the same as the
More Like This API.
{ "more_like_this" : { "fields" : ["name.first", "name.last"], "docs" : [ { "_index" : "test", "_type" : "type", "_id" : "1" }, { "_index" : "test", "_type" : "type", "_id" : "2" } ], "ids" : ["3", "4"], "min_term_freq" : 1, "max_query_terms" : 12 } }
more_like_this
can be shortened to mlt
.
Under the hood, more_like_this
simply creates multiple should
clauses in a bool
query of
interesting terms extracted from some provided text. The interesting terms are
selected with respect to their tf-idf scores. These are controlled by
min_term_freq
, min_doc_freq
, and max_doc_freq
. The number of interesting
terms is controlled by max_query_terms
. While the minimum number of clauses
that must be satisfied is controlled by percent_terms_to_match
. The terms
are extracted from like_text
which is analyzed by the analyzer associated
with the field, unless specified by analyzer
. There are other parameters,
such as min_word_length
, max_word_length
or stop_words
, to control what
terms should be considered as interesting. In order to give more weight to
more interesting terms, each boolean clause associated with a term could be
boosted by the term tf-idf score times some boosting factor boost_terms
.
When a search for multiple docs
is issued, More Like This generates a
more_like_this
query per document field in fields
. These fields
are
specified as a top level parameter or within each doc
.
The fields must be indexed and of type string
. Additionally, when
using ids
or docs
, the fields must be either stored
, store term_vector
or _source
must be enabled.
The more_like_this
top level parameters include:
Parameter | Description |
---|---|
|
A list of the fields to run the more like this query against.
Defaults to the |
|
The text to find documents like it, required if |
|
A list of documents following the same syntax as the
Multi GET API. The text is fetched from |
|
When using |
|
When using |
|
From the generated query, the percentage of terms
that must match (float value between 0 and 1). Defaults to |
|
The frequency below which terms will be ignored in the
source doc. The default frequency is |
|
The maximum number of query terms that will be
included in any generated query. Defaults to |
|
An array of stop words. Any word in this set is considered "uninteresting" and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting". |
|
The frequency at which words will be ignored which do
not occur in at least this many docs. Defaults to |
|
The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded. |
|
The minimum word length below which words will be
ignored. Defaults to |
|
The maximum word length above which words will be
ignored. Defaults to unbounded ( |
|
Sets the boost factor to use when boosting terms.
Defaults to deactivated ( |
|
Sets the boost value of the query. Defaults to |
|
The analyzer that will be used to analyze the |