ASCII folding token filter
editASCII folding token filter
editConverts alphabetic, numeric, and symbolic characters that are not in the Basic
Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
one exists. For example, the filter changes à
to a
.
This filter uses Lucene’s ASCIIFoldingFilter.
Example
editThe following analyze API request uses the asciifolding
filter to drop the diacritical marks in açaí à la carte
:
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'asciifolding' ], text: 'açaí à la carte' } ) puts response
GET /_analyze { "tokenizer" : "standard", "filter" : ["asciifolding"], "text" : "açaí à la carte" }
The filter produces the following tokens:
[ acai, a, la, carte ]
Add to an analyzer
editThe following create index API request uses the
asciifolding
filter to configure a new
custom analyzer.
response = client.indices.create( index: 'asciifold_example', body: { settings: { analysis: { analyzer: { standard_asciifolding: { tokenizer: 'standard', filter: [ 'asciifolding' ] } } } } } ) puts response
PUT /asciifold_example { "settings": { "analysis": { "analyzer": { "standard_asciifolding": { "tokenizer": "standard", "filter": [ "asciifolding" ] } } } } }
Configurable parameters
edit-
preserve_original
-
(Optional, Boolean)
If
true
, emit both original tokens and folded tokens. Defaults tofalse
.
Customize
editTo customize the asciifolding
filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom asciifolding
filter with
preserve_original
set to true:
PUT /asciifold_example { "settings": { "analysis": { "analyzer": { "standard_asciifolding": { "tokenizer": "standard", "filter": [ "my_ascii_folding" ] } }, "filter": { "my_ascii_folding": { "type": "asciifolding", "preserve_original": true } } } } }