WARNING: Version 5.6 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Keyword Repeat Token Filter
editKeyword Repeat Token Filter
editThe keyword_repeat
token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an unstemmed version of a
term to be indexed side by side with the stemmed version of the term.
Given the nature of this filter each token that isn’t transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
unique
filter with only_on_same_position
set to true
to drop
unnecessary duplicates.
Here is an example of using the keyword_repeat
token filter to
preserve both the stemmed and unstemmed version of tokens:
PUT /keyword_repeat_example { "settings": { "analysis": { "analyzer": { "stemmed_and_unstemmed": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "keyword_repeat", "porter_stem", "unique_stem"] } }, "filter": { "unique_stem": { "type": "unique", "only_on_same_position": true } } } } }
And you can test it with:
POST /keyword_repeat_example/_analyze { "analyzer" : "stemmed_and_unstemmed", "text" : "I like cats" }
And it’d respond:
{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "cats", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 }, { "token": "cat", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 } ] }
Which preserves both the cat
and cats
tokens. Compare this to the example
on the Keyword Marker Token Filter.