Keyword datatype

edit

A field to index structured content such as IDs, email addresses, hostnames, status codes, zip codes or tags.

They are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.

If you need to index full text content such as email bodies or product descriptions, it is likely that you should rather use a text field.

Below is an example of a mapping for a keyword field:

PUT my_index
{
  "mappings": {
    "properties": {
      "tags": {
        "type":  "keyword"
      }
    }
  }
}

Mapping numeric identifiers

Not all numeric data should be mapped as a numeric field datatype. Elasticsearch optimizes numeric fields, such as integer or long, for range queries. However, keyword fields are better for term and other term-level queries.

Identifiers, such as an ISBN or a product ID, are rarely used in range queries. However, they are often retrieved using term-level queries.

Consider mapping a numeric identifier as a keyword if:

  • You don’t plan to search for the identifier data using range queries.
  • Fast retrieval is important. term query searches on keyword fields are often faster than term searches on numeric fields.

If you’re unsure which to use, you can use a multi-field to map the data as both a keyword and a numeric datatype.

Parameters for keyword fields

edit

The following parameters are accepted by keyword fields:

boost

Mapping field-level query time boosting. Accepts a floating point number, defaults to 1.0.

doc_values

Should the field be stored on disk in a column-stride fashion, so that it can later be used for sorting, aggregations, or scripting? Accepts true (default) or false.

eager_global_ordinals

Should global ordinals be loaded eagerly on refresh? Accepts true or false (default). Enabling this is a good idea on fields that are frequently used for terms aggregations.

fields

Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations.

ignore_above

Do not index any string longer than this value. Defaults to 2147483647 so that all values would be accepted. Please however note that default dynamic mapping rules create a sub keyword field that overrides this default by setting ignore_above: 256.

index

Should the field be searchable? Accepts true (default) or false.

index_options

What information should be stored in the index, for scoring purposes. Defaults to docs but can also be set to freqs to take term frequency into account when computing scores.

norms

Whether field-length should be taken into account when scoring queries. Accepts true or false (default).

null_value

Accepts a string value which is substituted for any explicit null values. Defaults to null, which means the field is treated as missing.

store

Whether the field value should be stored and retrievable separately from the _source field. Accepts true or false (default).

similarity

Which scoring algorithm or similarity should be used. Defaults to BM25.

normalizer

How to pre-process the keyword prior to indexing. Defaults to null, meaning the keyword is kept as-is.

split_queries_on_whitespace

Whether full text queries should split the input on whitespace when building a query for this field. Accepts true or false (default).

meta

Metadata about the field.

Indexes imported from 2.x do not support keyword. Instead they will attempt to downgrade keyword into string. This allows you to merge modern mappings with legacy mappings. Long lived indexes will have to be recreated before upgrading to 6.x but mapping downgrade gives you the opportunity to do the recreation on your own schedule.