phonetic token filter

edit

The phonetic token filter takes the following settings:

encoder
Which phonetic encoder to use. Accepts metaphone (default), doublemetaphone, soundex, refinedsoundex, caverphone1, caverphone2, cologne, nysiis, koelnerphonetik, haasephonetik, beidermorse.
replace
Whether or not the original token should be replaced by the phonetic token. Accepts true (default) and false. Not supported by beidermorse encoding.
PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "standard",
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs 

Returns: J, joe, BLKS, bloggs

Double metaphone settings
edit

If the double_metaphone encoder is used, then this additional setting is supported:

max_code_len
The maximum length of the emitted metaphone token. Defaults to 4.
Beider Morse settings
edit

If the beider_morse encoder is used, then these additional settings are supported:

rule_type
Whether matching should be exact or approx (default).
name_type
Whether names are ashkenazi, sephardic, or generic (default).
languageset
An array of languages to check. If not specified, then the language will be guessed. Accepts: any, comomon, cyrillic, english, french, german, hebrew, hungarian, polish, romanian, russian, spanish.