WARNING: Version 1.4 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Compound Word Token Filter
editCompound Word Token Filter
editToken filters that allow to decompose compound words. There are two
types available: dictionary_decompounder
and
hyphenation_decompounder
.
The following are settings that can be set for a compound word token filter type:
Setting | Description |
---|---|
|
A list of words to use. |
|
A path (either relative to |
|
A path (either relative to |
|
Minimum word size(Integer). Defaults to 5. |
|
Minimum subword size(Integer). Defaults to 2. |
|
Maximum subword size(Integer). Defaults to 15. |
|
Only matching the longest(Boolean). Defaults to
|
Here is an example:
index : analysis : analyzer : myAnalyzer2 : type : custom tokenizer : standard filter : [myTokenFilter1, myTokenFilter2] filter : myTokenFilter1 : type : dictionary_decompounder word_list: [one, two, three] myTokenFilter2 : type : hyphenation_decompounder word_list_path: path/to/words.txt max_subword_size : 22