This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
ICU normalization character filter
editICU normalization character filter
editNormalizes characters as explained
here.
It registers itself as the icu_normalizer
character filter, which is
available to all indices without any further configuration. The type of
normalization can be specified with the name
parameter, which accepts nfc
,
nfkc
, and nfkc_cf
(default). Set the mode
parameter to decompose
to
convert nfc
to nfd
or nfkc
to nfkd
respectively:
Which letters are normalized can be controlled by specifying the
unicode_set_filter
parameter, which accepts a
UnicodeSet.
Here are two examples, the default usage and a customised character filter:
PUT icu_sample { "settings": { "index": { "analysis": { "analyzer": { "nfkc_cf_normalized": { "tokenizer": "icu_tokenizer", "char_filter": [ "icu_normalizer" ] }, "nfd_normalized": { "tokenizer": "icu_tokenizer", "char_filter": [ "nfd_normalizer" ] } }, "char_filter": { "nfd_normalizer": { "type": "icu_normalizer", "name": "nfc", "mode": "decompose" } } } } } }