Get tokens from text analysis
The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory.
The index.analyze.max_token_count
setting enables you to limit the number of tokens that can be produced.
If more than this limit of tokens gets generated, an error occurs.
The _analyze
endpoint without a specified index will always use 10000
as its limit.
Path parameters
-
index
string Required Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Query parameters
-
index
string Index used to derive the analyzer. If specified, the
analyzer
or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body
-
analyzer
string The name of the analyzer that should be applied to the provided
text
. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
attributes
array[string] Array of token attributes used to filter the output of the
explain
parameter. -
char_filter
array Array of character filters used to preprocess characters before the tokenizer.
-
explain
boolean If
true
, the response includes token attributes and additional details. -
field
string Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
filter
array Array of token filters used to apply after the tokenizer.
-
normalizer
string Normalizer to use to convert text into a single token.
text
string | array[string]
curl \
--request GET 'http://api.example.com/{index}/_analyze' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"analyzer\": \"standard\",\n \"text\": \"this is a test\"\n}"'
{
"analyzer": "standard",
"text": "this is a test"
}
{
"analyzer": "standard",
"text": [
"this is a test",
"the second text"
]
}
{
"tokenizer": "keyword",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"text": "this is a <b>test</b>"
}
{
"tokenizer": "whitespace",
"filter": [
"lowercase",
{
"type": "stop",
"stopwords": [
"a",
"is",
"this"
]
}
],
"text": "this is a test"
}
{
"field": "obj1.field1",
"text": "this is a test"
}
{
"normalizer": "my_normalizer",
"text": "BaR"
}
{
"tokenizer": "standard",
"filter": [
"snowball"
],
"text": "detailed output",
"explain": true,
"attributes": [
"keyword"
]
}
{
"detail": {
"custom_analyzer": true,
"charfilters": [],
"tokenizer": {
"name": "standard",
"tokens": [
{
"token": "detailed",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
}
]
},
"tokenfilters": [
{
"name": "snowball",
"tokens": [
{
"token": "detail",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0,
"keyword": false
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1,
"keyword": false
}
]
}
]
}
}