Analyze
editAnalyze
editPerforms the analysis process on a text and return the tokens breakdown of the text.
Can be used without specifying an index against one of the many built in analyzers:
GET _analyze { "analyzer" : "standard", "text" : "this is a test" }
If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
GET _analyze { "analyzer" : "standard", "text" : ["this is a test", "the second text"] }
Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter filter parameter name:
GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "text" : "this is a test" }
GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "char_filter" : ["html_strip"], "text" : "this is a <b>test</b>" }
Deprecated in 5.0.0.
Use filter
/char_filter
instead of filters
/char_filters
and token_filters
has been removed
Custom tokenizers, token filters, and character filters can be specified in the request body as follows:
GET _analyze { "tokenizer" : "whitespace", "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}], "text" : "this is a test" }
It can also run against a specific index:
GET analyze_sample/_analyze { "text" : "this is a test" }
The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the analyze_sample
index. An analyzer
can also be provided to use a different analyzer:
GET analyze_sample/_analyze { "analyzer" : "whitespace", "text" : "this is a test" }
Also, the analyzer can be derived based on a field mapping, for example:
GET analyze_sample/_analyze { "field" : "obj1.field1", "text" : "this is a test" }
Will cause the analysis to happen based on the analyzer configured in the
mapping for obj1.field1
(and if not, the default index analyzer).
A normalizer
can be provided for keyword field with normalizer associated with the analyze_sample
index.
GET analyze_sample/_analyze { "normalizer" : "my_normalizer", "text" : "BaR" }
Or by building a custom transient normalizer out of token filters and char filters.
GET _analyze { "filter" : ["lowercase"], "text" : "BaR" }