WARNING: Version 2.2 of Elasticsearch has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

› › ›

Token count datatype

edit

Token count datatype

edit

A field of type token_count is really an integer field which accepts string values, analyzes them, then indexes the number of tokens in the string.

For instance:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": { 
          "type": "string",
          "fields": {
            "length": { 
              "type":     "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{ "name": "John Smith" }

PUT my_index/my_type/2
{ "name": "Rachel Alice Williams" }

GET my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

Copy as curl View in Sense

	The `name` field is an analyzed string field which uses the default `standard` analyzer.
	The `name.length` field is a `token_count` multi-field which will index the number of tokens in the `name` field.
	This query matches only the document containing `Rachel Alice Williams`, as it contains three tokens.

Technically the token_count type sums position increments rather than counting tokens. This means that even if the analyzer filters out stop words they are included in the count.

Parameters for `token_count` fields

edit

The following parameters are accepted by token_count fields:

`analyzer`	The analyzer which should be used to analyze the string value. Required. For best performance, use an analyzer without token filters.
`boost`	Field-level index time boosting. Accepts a floating point number, defaults to `1.0`.
`doc_values`	Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts `true` (default) or `false`.
`index`	Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
`include_in_all`	Whether or not the field value should be included in the `_all` field? Accepts `true` or `false`. Defaults to `false`. Note: if `true`, it is the string value that is added to `_all`, not the calculated token count.
`null_value`	Accepts a numeric value of the same `type` as the field which is substituted for any explicit `null` values. Defaults to `null`, which means the field is treated as missing.
`precision_step`	Controls the number of extra terms that are indexed to make `range` queries faster. Defaults to `32`.
`store`	Whether the field value should be stored and retrievable separately from the `_source` field. Accepts `true` or `false` (default).

« String datatype Meta-Fields »

On this page

Parameters for token_count fields

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Token count datatype

Token count datatype

Parameters for `token_count` fields

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

Token count datatype

Token count datatype

Parameters for token_count fields

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

Parameters for `token_count` fields