WARNING: Version 5.5 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Letter Tokenizer
editLetter Tokenizer
editThe letter
tokenizer breaks text into terms whenever it encounters a
character which is not a letter. It does a reasonable job for most European
languages, but does a terrible job for some Asian languages, where words are
not separated by spaces.
Example output
editPOST _analyze { "tokenizer": "letter", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The above sentence would produce the following terms:
[ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ]
Configuration
editThe letter
tokenizer is not configurable.