ICU Collation Keyword Field
editICU Collation Keyword Field
editCollations are used for sorting documents in a language-specific word order.
The icu_collation_keyword
field type is available to all indices and will encode
the terms directly as bytes in a doc values field and a single indexed token just
like a standard Keyword Field.
Defaults to using DUCET collation, which is a best-effort attempt at language-neutral sorting.
Below is an example of how to set up a field for sorting German names in “phonebook” order:
PUT my_index { "mappings": { "_doc": { "properties": { "name": { "type": "text", "fields": { "sort": { "type": "icu_collation_keyword", "index": false, "language": "de", "country": "DE", "variant": "@collation=phonebook" } } } } } } } GET _search { "query": { "match": { "name": "Fritz" } }, "sort": "name.sort" }
The |
|
The |
|
An example query which searches the |
Parameters for ICU Collation Keyword Fields
editThe following parameters are accepted by icu_collation_keyword
fields:
|
Should the field be stored on disk in a column-stride fashion, so that it
can later be used for sorting, aggregations, or scripting? Accepts |
|
Should the field be searchable? Accepts |
|
Accepts a string value which is substituted for any explicit |
|
Whether the field value should be stored and retrievable separately from
the |
|
Multi-fields allow the same string value to be indexed in multiple ways for different purposes, such as one field for search and a multi-field for sorting and aggregations. |
Collation options
edit-
strength
-
The strength property determines the minimum level of difference considered
significant during comparison. Possible values are :
primary
,secondary
,tertiary
,quaternary
oridentical
. See the ICU Collation documentation for a more detailed explanation for each value. Defaults totertiary
unless otherwise specified in the collation. -
decomposition
-
Possible values:
no
(default, but collation-dependent) orcanonical
. Setting this decomposition property tocanonical
allows the Collator to handle unnormalized text properly, producing the same results as if the text were normalized. Ifno
is set, it is the user’s responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior. Since a great many of the world’s languages do not require text normalization, most locales setno
as the default decomposition mode.
The following options are expert only:
-
alternate
-
Possible values:
shifted
ornon-ignorable
. Sets the alternate handling for strengthquaternary
to be either shifted or non-ignorable. Which boils down to ignoring punctuation and whitespace. -
case_level
-
Possible values:
true
orfalse
(default). Whether case level sorting is required. When strength is set toprimary
this will ignore accent differences. -
case_first
-
Possible values:
lower
orupper
. Useful to control which case is sorted first when case is not ignored for strengthtertiary
. The default depends on the collation. -
numeric
-
Possible values:
true
orfalse
(default) . Whether digits are sorted according to their numeric representation. For example the valueegg-9
is sorted before the valueegg-21
. -
variable_top
-
Single character or contraction. Controls what is variable for
alternate
. -
hiragana_quaternary_mode
-
Possible values:
true
orfalse
. Distinguishing between Katakana and Hiragana characters inquaternary
strength.