ignore_above

edit

Strings longer than the ignore_above setting will not be processed by the analyzer and will not be indexed. This is mainly useful for not_analyzed string fields, which are typically used for filtering, aggregations, and sorting. These are structured fields and it doesn’t usually make sense to allow very long terms to be indexed in these fields.

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "string",
          "index": "not_analyzed",
          "ignore_above": 20 
        }
      }
    }
  }
}

PUT my_index/my_type/1 
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET _search 
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

This field will ignore any string longer than 20 characters.

This document is indexed successfully.

This document will be indexed, but without indexing the message field.

Search returns both documents, but only the first is present in the terms aggregation.

The ignore_above setting is allowed to have different settings for fields of the same name in the same index. Its value can be updated on existing fields using the PUT mapping API.

This option is also useful for protecting against Lucene’s term byte-length limit of 32766.

The value for ignore_above is the character count, but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 3 = 10922 since UTF-8 characters may occupy at most 3 bytes.