WARNING: Version 1.6 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Standard Tokenizer
editStandard Tokenizer
editA tokenizer of type standard
providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
The following are settings that can be set for a standard
tokenizer
type:
Setting | Description |
---|---|
|
The maximum token length. If a token is seen that
exceeds this length then it is discarded. Defaults to |