WARNING: Version 1.6 of Elasticsearch has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

› › ›

Standard Tokenizer

edit

Standard Tokenizer

edit

A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents. The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

The following are settings that can be set for a standard tokenizer type:

Setting	Description
`max_token_length`	The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to `255`.

« Tokenizers Edge NGram Tokenizer »