Get tokens from text analysis

GET /_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.

External documentation

application/json

Body

analyzer string

The name of the analyzer that should be applied to the provided text. This could be a built-in analyzer, or an analyzer that’s been configured in the index.
attributes array[string]

Array of token attributes used to filter the output of the explain parameter.
char_filter array

Array of character filters used to preprocess characters before the tokenizer.

External documentation
explain boolean

If true, the response includes token attributes and additional details.
field string

Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
filter array

Array of token filters used to apply after the tokenizer.

External documentation
normalizer string

Normalizer to use to convert text into a single token.
text string | array[string]

One of:
analyze:TextToAnalyze string analyze:TextToAnalyze array[string]

Responses

200 application/json
Hide response attributes Show response attributes object
- detail object
  
  Additional properties are allowed.
  
  Hide detail attributes Show detail attributes object
  
  analyzer object
  
  Additional properties are allowed.
  
  Hide analyzer attributes Show analyzer attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
  
  charfilters array[object]
  
  Hide charfilters attributes Show charfilters attributes object
  
  filtered_text array[string] Required
  
  name string Required
  
  custom_analyzer boolean Required
  
  tokenfilters array[object]
  
  Hide tokenfilters attributes Show tokenfilters attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
  
  tokenizer object
  
  Additional properties are allowed.
  
  Hide tokenizer attributes Show tokenizer attributes object
  
  name string Required
  
  tokens array[object] Required
  
  Hide tokens attributes Show tokens attributes object
  
  bytes string Required
  
  end_offset number Required
  
  keyword boolean
  
  position number Required
  
  positionLength number Required
  
  start_offset number Required
  
  termFrequency number Required
  
  token string Required
  
  type string Required
- tokens array[object]
  
  Hide tokens attributes Show tokens attributes object
  
  end_offset number Required
  
  position number Required
  
  positionLength number
  
  start_offset number Required
  
  token string Required
  
  type string Required

GET /_analyze

curl \
 -X GET http://api.example.com/_analyze \
 -H "Content-Type: application/json" \
 -d '{"text":"this is a test","analyzer":"standard"}'

Request example

You can apply any of the built-in analyzers to the text string without specifying an index.

{
  "text": "this is a test",
  "analyzer": "standard"
}

Response examples (200)

{
  "detail": {
    "analyzer": {
      "name": "string",
      "tokens": [
        {
          "bytes": "string",
          "end_offset": 42.0,
          "keyword": true,
          "position": 42.0,
          "positionLength": 42.0,
          "start_offset": 42.0,
          "termFrequency": 42.0,
          "token": "string",
          "type": "string"
        }
      ]
    },
    "charfilters": [
      {
        "filtered_text": [
          "string"
        ],
        "name": "string"
      }
    ],
    "custom_analyzer": true,
    "tokenfilters": [
      {
        "name": "string",
        "tokens": [
          {
            "bytes": "string",
            "end_offset": 42.0,
            "keyword": true,
            "position": 42.0,
            "positionLength": 42.0,
            "start_offset": 42.0,
            "termFrequency": 42.0,
            "token": "string",
            "type": "string"
          }
        ]
      }
    ],
    "tokenizer": {
      "name": "string",
      "tokens": [
        {
          "bytes": "string",
          "end_offset": 42.0,
          "keyword": true,
          "position": 42.0,
          "positionLength": 42.0,
          "start_offset": 42.0,
          "termFrequency": 42.0,
          "token": "string",
          "type": "string"
        }
      ]
    }
  },
  "tokens": [
    {
      "end_offset": 42.0,
      "position": 42.0,
      "positionLength": 42.0,
      "start_offset": 42.0,
      "token": "string",
      "type": "string"
    }
  ]
}