WARNING: Version 6.1 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Analyze
editAnalyze
editPerforms the analysis process on a text and return the tokens breakdown of the text.
Can be used without specifying an index against one of the many built in analyzers:
GET _analyze { "analyzer" : "standard", "text" : "this is a test" }
If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
GET _analyze { "analyzer" : "standard", "text" : ["this is a test", "the second text"] }
Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter filter parameter name:
GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "text" : "this is a test" }
GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "char_filter" : ["html_strip"], "text" : "this is a <b>test</b>" }
Deprecated in 5.0.0.
Use filter
/char_filter
instead of filters
/char_filters
and token_filters
has been removed
Custom tokenizers, token filters, and character filters can be specified in the request body as follows:
GET _analyze { "tokenizer" : "whitespace", "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}], "text" : "this is a test" }
It can also run against a specific index:
GET analyze_sample/_analyze { "text" : "this is a test" }
The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the analyze_sample
index. An analyzer
can also be provided to use a different analyzer:
GET analyze_sample/_analyze { "analyzer" : "whitespace", "text" : "this is a test" }
Also, the analyzer can be derived based on a field mapping, for example:
GET analyze_sample/_analyze { "field" : "obj1.field1", "text" : "this is a test" }
Will cause the analysis to happen based on the analyzer configured in the
mapping for obj1.field1
(and if not, the default index analyzer).
A normalizer
can be provided for keyword field with normalizer associated with the analyze_sample
index.
GET analyze_sample/_analyze { "normalizer" : "my_normalizer", "text" : "BaR" }
Or by building a custom transient normalizer out of token filters and char filters.
GET _analyze { "filter" : ["lowercase"], "text" : "BaR" }