ignore_above
editignore_above
editStrings longer than the ignore_above
setting will not be indexed or stored.
For arrays of strings, ignore_above
will be applied for each array element separately and string elements longer than ignore_above
will not be indexed or stored.
All strings/array elements will still be present in the _source
field, if the latter is enabled which is the default in Elasticsearch.
PUT my_index { "mappings": { "_doc": { "properties": { "message": { "type": "keyword", "ignore_above": 20 } } } } } PUT my_index/_doc/1 { "message": "Syntax error" } PUT my_index/_doc/2 { "message": "Syntax error with some long stacktrace" } GET _search { "aggs": { "messages": { "terms": { "field": "message" } } } }
This field will ignore any string longer than 20 characters. |
|
This document is indexed successfully. |
|
This document will be indexed, but without indexing the |
|
Search returns both documents, but only the first is present in the terms aggregation. |
The ignore_above
setting is allowed to have different settings for
fields of the same name in the same index. Its value can be updated on
existing fields using the PUT mapping API.
This option is also useful for protecting against Lucene’s term byte-length
limit of 32766
.
The value for ignore_above
is the character count, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to 32766 / 4 = 8191
since UTF-8 characters may occupy at most
4 bytes.