- Elasticsearch Guide: other versions:
- Getting Started
- Setup
- Breaking changes
- API Conventions
- Document APIs
- Search APIs
- Search
- URI Search
- Request Body Search
- Search Template
- Search Shards API
- Aggregations
- Min Aggregation
- Max Aggregation
- Sum Aggregation
- Avg Aggregation
- Stats Aggregation
- Extended Stats Aggregation
- Value Count Aggregation
- Percentiles Aggregation
- Percentile Ranks Aggregation
- Cardinality Aggregation
- Geo Bounds Aggregation
- Top hits Aggregation
- Scripted Metric Aggregation
- Global Aggregation
- Filter Aggregation
- Filters Aggregation
- Missing Aggregation
- Nested Aggregation
- Reverse nested Aggregation
- Children Aggregation
- Terms Aggregation
- Significant Terms Aggregation
- Range Aggregation
- Date Range Aggregation
- IPv4 Range Aggregation
- Histogram Aggregation
- Date Histogram Aggregation
- Geo Distance Aggregation
- GeoHash grid Aggregation
- Facets
- Suggesters
- Multi Search API
- Count API
- Search Exists API
- Validate API
- Explain API
- Percolator
- More Like This API
- Indices APIs
- Create Index
- Delete Index
- Get Index
- Indices Exists
- Open / Close Index API
- Put Mapping
- Get Mapping
- Get Field Mapping
- Types Exists
- Delete Mapping
- Index Aliases
- Update Indices Settings
- Get Settings
- Analyze
- Index Templates
- Warmers
- Status
- Indices Stats
- Indices Segments
- Indices Recovery
- Clear Cache
- Flush
- Refresh
- Optimize
- Upgrade
- Shadow replica indices
- cat APIs
- Cluster APIs
- Query DSL
- Queries
- Match Query
- Multi Match Query
- Bool Query
- Boosting Query
- Common Terms Query
- Constant Score Query
- Dis Max Query
- Filtered Query
- Fuzzy Like This Query
- Fuzzy Like This Field Query
- Function Score Query
- Fuzzy Query
- GeoShape Query
- Has Child Query
- Has Parent Query
- Ids Query
- Indices Query
- Match All Query
- More Like This Query
- Nested Query
- Prefix Query
- Query String Query
- Simple Query String Query
- Range Query
- Regexp Query
- Span First Query
- Span Multi Term Query
- Span Near Query
- Span Not Query
- Span Or Query
- Span Term Query
- Term Query
- Terms Query
- Top Children Query
- Wildcard Query
- Minimum Should Match
- Multi Term Query Rewrite
- Template Query
- Filters
- And Filter
- Bool Filter
- Exists Filter
- Geo Bounding Box Filter
- Geo Distance Filter
- Geo Distance Range Filter
- Geo Polygon Filter
- GeoShape Filter
- Geohash Cell Filter
- Has Child Filter
- Has Parent Filter
- Ids Filter
- Indices Filter
- Limit Filter
- Match All Filter
- Missing Filter
- Nested Filter
- Not Filter
- Or Filter
- Prefix Filter
- Query Filter
- Range Filter
- Regexp Filter
- Script Filter
- Term Filter
- Terms Filter
- Type Filter
- Queries
- Mapping
- Analysis
- Analyzers
- Tokenizers
- Token Filters
- Standard Token Filter
- ASCII Folding Token Filter
- Length Token Filter
- Lowercase Token Filter
- Uppercase Token Filter
- NGram Token Filter
- Edge NGram Token Filter
- Porter Stem Token Filter
- Shingle Token Filter
- Stop Token Filter
- Word Delimiter Token Filter
- Stemmer Token Filter
- Stemmer Override Token Filter
- Keyword Marker Token Filter
- Keyword Repeat Token Filter
- KStem Token Filter
- Snowball Token Filter
- Phonetic Token Filter
- Synonym Token Filter
- Compound Word Token Filter
- Reverse Token Filter
- Elision Token Filter
- Truncate Token Filter
- Unique Token Filter
- Pattern Capture Token Filter
- Pattern Replace Token Filter
- Trim Token Filter
- Limit Token Count Token Filter
- Hunspell Token Filter
- Common Grams Token Filter
- Normalization Token Filter
- CJK Width Token Filter
- CJK Bigram Token Filter
- Delimited Payload Token Filter
- Keep Words Token Filter
- Keep Types Token Filter
- Classic Token Filter
- Apostrophe Token Filter
- Character Filters
- ICU Analysis Plugin
- Modules
- Index Modules
- Testing
- Glossary of terms
WARNING: Version 1.5 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Completion Suggester
editCompletion Suggester
editIn order to understand the format of suggestions, please read the Suggesters page first.
The completion
suggester is a so-called prefix suggester. It does not
do spell correction like the term
or phrase
suggesters but allows
basic auto-complete
functionality.
Why another suggester? Why not prefix queries?
editThe first question which comes to mind when reading about a prefix suggestion is, why you should use it at all, if you have prefix queries already. The answer is simple: Prefix suggestions are fast.
The data structures are internally backed by Lucenes
AnalyzingSuggester
, which uses FSTs to execute suggestions. Usually
these data structures are costly to create, stored in-memory and need to
be rebuilt every now and then to reflect changes in your indexed
documents. The completion
suggester circumvents this by storing the
FST as part of your index during index time. This allows for really fast
loads and executions.
Mapping
editIn order to use this feature, you have to specify a special mapping for this field, which enables the special storage of the field.
curl -X PUT localhost:9200/music curl -X PUT localhost:9200/music/song/_mapping -d '{ "song" : { "properties" : { "name" : { "type" : "string" }, "suggest" : { "type" : "completion", "index_analyzer" : "simple", "search_analyzer" : "simple", "payloads" : true } } } }'
Mapping supports the following parameters:
-
index_analyzer
-
The index analyzer to use, defaults to
simple
. -
search_analyzer
-
The search analyzer to use, defaults to
simple
. In case you are wondering why we did not opt for thestandard
analyzer: We try to have easy to understand behaviour here, and if you index the field contentAt the Drive-in
, you will not get any suggestions fora
, nor ford
(the first non stopword). -
payloads
-
Enables the storing of payloads, defaults to
false
-
preserve_separators
-
Preserves the separators, defaults to
true
. If disabled, you could find a field starting withFoo Fighters
, if you suggest forfoof
. -
preserve_position_increments
-
Enables position increments, defaults
to
true
. If disabled and using stopwords analyzer, you could get a field starting withThe Beatles
, if you suggest forb
. Note: You could also achieve this by indexing two inputs,Beatles
andThe Beatles
, no need to change a simple analyzer, if you are able to enrich your data. -
max_input_length
-
Limits the length of a single input, defaults to
50
UTF-16 code points. This limit is only used at index time to reduce the total number of characters per input string in order to prevent massive inputs from bloating the underlying datastructure. The most usecases won’t be influenced by the default value since prefix completions hardly grow beyond prefixes longer than a handful of characters. (Old name "max_input_len" is deprecated)
Indexing
editcurl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{ "name" : "Nevermind", "suggest" : { "input": [ "Nevermind", "Nirvana" ], "output": "Nirvana - Nevermind", "payload" : { "artistId" : 2321 }, "weight" : 34 } }'
The following parameters are supported:
-
input
- The input to store, this can be a an array of strings or just a string. This field is mandatory.
-
output
-
The string to return, if a suggestion matches. This is very
useful to normalize outputs (i.e. have them always in the format
artist - songname
). This is optional. Note: The result is de-duplicated if several documents have the same output, i.e. only one is returned as part of the suggest result. -
payload
-
An arbitrary JSON object, which is simply returned in the
suggest option. You could store data like the id of a document, in order
to load it from elasticsearch without executing another search (which
might not yield any results, if
input
andoutput
differ strongly). -
weight
- A positive integer or a string containing a positive integer, which defines a weight and allows you to rank your suggestions. This field is optional.
Even though you will lose most of the features of the completion suggest, you can choose to use the following shorthand form. Keep in mind that you will not be able to use several inputs, an output, payloads or weights. This form does still work inside of multi fields.
{ "suggest" : "Nirvana" }
Querying
editSuggesting works as usual, except that you have to specify the suggest
type as completion
.
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{ "song-suggest" : { "text" : "n", "completion" : { "field" : "suggest" } } }' { "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "song-suggest" : [ { "text" : "n", "offset" : 0, "length" : 4, "options" : [ { "text" : "Nirvana - Nevermind", "score" : 34.0, "payload" : {"artistId":2321} } ] } ] }
As you can see, the payload is included in the response, if configured
appropriately. If you configured a weight for a suggestion, this weight
is used as score
. Also the text
field uses the output
of your
indexed suggestion, if configured, otherwise the matched part of the
input
field.
The basic completion suggester query supports the following two parameters:
-
field
- The name of the field on which to run the query (required).
-
size
-
The number of suggestions to return (defaults to
5
).
The completion suggester considers all documents in the index. See Context Suggester for an explanation of how to query a subset of documents instead.
Fuzzy queries
editThe completion suggester also supports fuzzy queries - this means, you can actually have a typo in your search and still get results back.
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{ "song-suggest" : { "text" : "n", "completion" : { "field" : "suggest", "fuzzy" : { "fuzziness" : 2 } } } }'
The fuzzy query can take specific fuzzy parameters. The following parameters are supported:
|
The fuzziness factor, defaults to |
|
Sets if transpositions should be counted
as one or two changes, defaults to |
|
Minimum length of the input before fuzzy
suggestions are returned, defaults |
|
Minimum length of the input, which is not
checked for fuzzy alternatives, defaults to |
|
Sets all are measurements (like edit distance, transpositions and lengths) in unicode code points (actual letters) instead of bytes. |
If you want to stick with the default values, but
still use fuzzy, you can either use fuzzy: {}
or fuzzy: true
.