IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
editThe normalizer
property of keyword
fields is similar to
except that it guarantees that the analysis chain
produces a single token.
The normalizer
is applied prior to indexing the keyword, as well as at
search-time when the keyword
field is searched via a query parser such as
the match
query or via a term-level query
such as the term
A simple normalizer called lowercase
ships with elasticsearch and can be used.
Custom normalizers can be defined as part of analysis settings as follows.
PUT index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } PUT index/_doc/1 { "foo": "BÀR" } PUT index/_doc/2 { "foo": "bar" } PUT index/_doc/3 { "foo": "baz" } POST index/_refresh GET index/_search { "query": { "term": { "foo": "BAR" } } } GET index/_search { "query": { "match": { "foo": "BAR" } } }
The above queries match documents 1 and 2 since BÀR
is converted to bar
both index and query time.
{ "took": $body.took, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 2, "relation": "eq" }, "max_score": 0.4700036, "hits": [ { "_index": "index", "_id": "1", "_score": 0.4700036, "_source": { "foo": "BÀR" } }, { "_index": "index", "_id": "2", "_score": 0.4700036, "_source": { "foo": "bar" } } ] } }
Also, the fact that keywords are converted prior to indexing also means that aggregations return normalized values:
GET index/_search { "size": 0, "aggs": { "foo_terms": { "terms": { "field": "foo" } } } }
{ "took": 43, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 3, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "foo_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "bar", "doc_count": 2 }, { "key": "baz", "doc_count": 1 } ] } } }