IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Apostrophe token filter CJK bigram token filter »

› › ›

ASCII folding token filter

edit

ASCII folding token filter

edit

Converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if one exists. For example, the filter changes à to a.

This filter uses Lucene’s ASCIIFoldingFilter.

Example

edit

The following analyze API request uses the asciifolding filter to drop the diacritical marks in açaí à la carte:

response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
      'asciifolding'
    ],
    text: 'açaí à la carte'
  }
)
puts response

GET /_analyze
{
  "tokenizer" : "standard",
  "filter" : ["asciifolding"],
  "text" : "açaí à la carte"
}

Copy as curl Try in Elastic

The filter produces the following tokens:

[ acai, a, la, carte ]

Add to an analyzer

edit

The following create index API request uses the asciifolding filter to configure a new custom analyzer.

response = client.indices.create(
  index: 'asciifold_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          standard_asciifolding: {
            tokenizer: 'standard',
            filter: [
              'asciifolding'
            ]
          }
        }
      }
    }
  }
)
puts response

PUT /asciifold_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_asciifolding": {
          "tokenizer": "standard",
          "filter": [ "asciifolding" ]
        }
      }
    }
  }
}

Copy as curl Try in Elastic

Configurable parameters

edit

preserve_original: (Optional, Boolean) If true, emit both original tokens and folded tokens. Defaults to false.

Customize

edit

To customize the asciifolding filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following request creates a custom asciifolding filter with preserve_original set to true:

PUT /asciifold_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_asciifolding": {
          "tokenizer": "standard",
          "filter": [ "my_ascii_folding" ]
        }
      },
      "filter": {
        "my_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        }
      }
    }
  }
}

Copy as curl Try in Elastic

« Apostrophe token filter CJK bigram token filter »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

ASCII folding token filter

ASCII folding token filter

Example

Add to an analyzer

Configurable parameters

Customize

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards