WARNING: Version 2.0 of Elasticsearch has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

« CJK Width Token Filter Delimited Payload Token Filter »

› › ›

CJK Bigram Token Filter

edit

CJK Bigram Token Filter

edit

The cjk_bigram token filter forms bigrams out of the CJK terms that are generated by the standard tokenizer or the icu_tokenizer (see ICU Analysis Plugin).

By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you always want to output both unigrams and bigrams, set the output_unigrams flag to true. This can be used for a combined unigram+bigram approach.

Bigrams are generated for characters in han, hiragana, katakana and hangul, but bigrams can be disabled for particular scripts with the ignored_scripts parameter. All non-CJK input is passed through unmodified.

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "han_bigrams" : {
                    "tokenizer" : "standard",
                    "filter" : ["han_bigrams_filter"]
                }
            },
            "filter" : {
                "han_bigrams_filter" : {
                    "type" : "cjk_bigram",
                    "ignored_scripts": [
                        "hiragana",
                        "katakana",
                        "hangul"
                    ],
                    "output_unigrams" : true
                }
            }
        }
    }
}

« CJK Width Token Filter Delimited Payload Token Filter »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

CJK Bigram Token Filter

CJK Bigram Token Filter

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards