WARNING: Version 2.3 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Pattern Tokenizer
editPattern Tokenizer
editA tokenizer of type pattern
that can flexibly separate text into terms
via a regular expression. Accepts the following settings:
Setting | Description |
---|---|
|
The regular expression pattern, defaults to |
|
The regular expression flags. |
|
Which group to extract into tokens. Defaults to |
IMPORTANT: The regular expression should match the token separators, not the tokens themselves.
group
set to -1
(the default) is equivalent to "split". Using group
>= 0 selects the matching group as the token. For example, if you have:
pattern = '([^']+)' group = 0 input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb'
and 'ccc'
(including the '
marks). With the same input but using group=1, the output would be:
bbb
and ccc
(no '
marks).