IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« hiragana_uppercase token filter Korean (nori) analysis plugin »

› › ›

katakana_uppercase token filter

edit

`katakana_uppercase` token filter

edit

The katakana_uppercase token filter normalizes small letters (捨て仮名) in katakana into standard letters. This filter is useful if you want to search against old style Japanese text such as patents, legal documents, contract policies, etc.

For example:

PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "katakana_uppercase"
            ]
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "ストップウォッチ"
}

Which results in:

{
  "tokens": [
    {
      "token": "ストツプウオツチ",
      "start_offset": 0,
      "end_offset": 8,
      "type": "word",
      "position": 0
    }
  ]
}

« hiragana_uppercase token filter Korean (nori) analysis plugin »