IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Keyword Repeat Token Filter
editKeyword Repeat Token Filter
editThe keyword_repeat
token filter Emits each incoming token twice once
as keyword and once as a non-keyword to allow an unstemmed version of a
term to be indexed side by side with the stemmed version of the term.
Given the nature of this filter each token that isn’t transformed by a
subsequent stemmer will be indexed twice. Therefore, consider adding a
unique
filter with only_on_same_position
set to true
to drop
unnecessary duplicates.
Here is an example of using the keyword_repeat
token filter to
preserve both the stemmed and unstemmed version of tokens:
PUT /keyword_repeat_example { "settings": { "analysis": { "analyzer": { "stemmed_and_unstemmed": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "keyword_repeat", "porter_stem", "unique_stem"] } }, "filter": { "unique_stem": { "type": "unique", "only_on_same_position": true } } } } }
And you can test it with:
POST /keyword_repeat_example/_analyze { "analyzer" : "stemmed_and_unstemmed", "text" : "I like cats" }
And it’d respond:
{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "cats", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 }, { "token": "cat", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 } ] }
Which preserves both the cat
and cats
tokens. Compare this to the example
on the Keyword Marker Token Filter.