This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
ICU normalization token filter
editICU normalization token filter
editNormalizes characters as explained
here. It registers
itself as the icu_normalizer
token filter, which is available to all indices
without any further configuration. The type of normalization can be specified
with the name
parameter, which accepts nfc
, nfkc
, and nfkc_cf
(default).
Which letters are normalized can be controlled by specifying the
unicode_set_filter
parameter, which accepts a
UnicodeSet.
You should probably prefer the Normalization character filter.
Here are two examples, the default usage and a customised token filter:
PUT icu_sample { "settings": { "index": { "analysis": { "analyzer": { "nfkc_cf_normalized": { "tokenizer": "icu_tokenizer", "filter": [ "icu_normalizer" ] }, "nfc_normalized": { "tokenizer": "icu_tokenizer", "filter": [ "nfc_normalizer" ] } }, "filter": { "nfc_normalizer": { "type": "icu_normalizer", "name": "nfc" } } } } } }