Analyze

edit

Performs the analysis process on a text and return the tokens breakdown of the text.

Can be used without specifying an index against one of the many built in analyzers:

GET _analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

GET _analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}

Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter filter parameter name:

GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}
GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

Deprecated in 5.0.0.

Use filter/char_filter instead of filters/char_filters and token_filters has been removed

Custom tokenizers, token filters, and character filters can be specified in the request body as follows:

GET _analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

It can also run against a specific index:

GET analyze_sample/_analyze
{
  "text" : "this is a test"
}

The above will run an analysis on the "this is a test" text, using the default index analyzer associated with the analyze_sample index. An analyzer can also be provided to use a different analyzer:

GET analyze_sample/_analyze
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}

Also, the analyzer can be derived based on a field mapping, for example:

GET analyze_sample/_analyze
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}

Will cause the analysis to happen based on the analyzer configured in the mapping for obj1.field1 (and if not, the default index analyzer).

A normalizer can be provided for keyword field with normalizer associated with the analyze_sample index.

GET analyze_sample/_analyze
{
  "normalizer" : "my_normalizer",
  "text" : "BaR"
}

Or by building a custom transient normalizer out of token filters and char filters.

GET _analyze
{
  "filter" : ["lowercase"],
  "text" : "BaR"
}