IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Explain Analyze
editExplain Analyze
editIf you want to get more advanced details, set explain
to true
(defaults to false
). It will output all token attributes for each token.
You can filter token attributes you want to output by setting attributes
option.
The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
GET _analyze { "tokenizer" : "standard", "filter" : ["snowball"], "text" : "detailed output", "explain" : true, "attributes" : ["keyword"] }
The request returns the following result:
{ "detail" : { "custom_analyzer" : true, "charfilters" : [ ], "tokenizer" : { "name" : "standard", "tokens" : [ { "token" : "detailed", "start_offset" : 0, "end_offset" : 8, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "output", "start_offset" : 9, "end_offset" : 15, "type" : "<ALPHANUM>", "position" : 1 } ] }, "tokenfilters" : [ { "name" : "snowball", "tokens" : [ { "token" : "detail", "start_offset" : 0, "end_offset" : 8, "type" : "<ALPHANUM>", "position" : 0, "keyword" : false }, { "token" : "output", "start_offset" : 9, "end_offset" : 15, "type" : "<ALPHANUM>", "position" : 1, "keyword" : false } ] } ] } }
Settings to prevent tokens explosion
editGenerating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced:
-
index.analyze.max_token_count
-
The maximum number of tokens that can be produced using
_analyze
API. The default value is10000
. If more than this limit of tokens gets generated, an error will be thrown. The_analyze
endpoint without a specified index will always use10000
value as a limit. This setting allows you to control the limit for a specific index:
PUT analyze_sample { "settings" : { "index.analyze.max_token_count" : 20000 } }
GET analyze_sample/_analyze { "text" : "this is a test" }