WARNING: Version 2.0 of Elasticsearch has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

« Path Hierarchy Tokenizer Thai Tokenizer »

› › ›

Classic Tokenizer

edit

Classic Tokenizer

edit

A tokenizer of type classic providing grammar based tokenizer that is a good tokenizer for English language documents. This tokenizer has heuristics for special treatment of acronyms, company names, email addresses, and internet host names. However, these rules don’t always work, and the tokenizer doesn’t work well for most languages other than English.

The following are settings that can be set for a classic tokenizer type:

Setting	Description
`max_token_length`	The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to `255`.

« Path Hierarchy Tokenizer Thai Tokenizer »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Classic Tokenizer

Classic Tokenizer

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards