IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› › ›

Explain Analyze

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Explain Analyze

edit

If you want to get more advanced details, set explain to true (defaults to false). It will output all token attributes for each token. You can filter token attributes you want to output by setting attributes option.

The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.

GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] 
}

Copy as curl Try in Elastic

Set "keyword" to output "keyword" attribute only

The request returns the following result:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false 
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false 
      } ]
    } ]
  }
}

Output only "keyword" attribute, since specify "attributes" in the request.

Settings to prevent tokens explosion

edit

Generating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced:

index.analyze.max_token_count: The maximum number of tokens that can be produced using _analyze API. The default value is 10000. If more than this limit of tokens gets generated, an error will be thrown. The _analyze endpoint without a specified index will always use 10000 value as a limit. This setting allows you to control the limit for a specific index:

PUT analyze_sample
{
  "settings" : {
    "index.analyze.max_token_count" : 20000
  }
}

Copy as curl Try in Elastic

GET analyze_sample/_analyze
{
  "text" : "this is a test"
}

Copy as curl Try in Elastic

« Analyze Index Templates »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Explain Analyze

Explain Analyze

Settings to prevent tokens explosion

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards