phonetic token filter

edit

The phonetic token filter takes the following settings:

encoder
Which phonetic encoder to use. Accepts metaphone (default), double_metaphone, soundex, refined_soundex, caverphone1, caverphone2, cologne, nysiis, koelnerphonetik, haasephonetik, beider_morse, daitch_mokotoff.
replace
Whether or not the original token should be replaced by the phonetic token. Accepts true (default) and false. Not supported by beider_morse encoding.
PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

GET phonetic_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Joe Bloggs" 
}

Returns: J, joe, BLKS, bloggs

Double metaphone settings
edit

If the double_metaphone encoder is used, then this additional setting is supported:

max_code_len
The maximum length of the emitted metaphone token. Defaults to 4.
Beider Morse settings
edit

If the beider_morse encoder is used, then these additional settings are supported:

rule_type
Whether matching should be exact or approx (default).
name_type
Whether names are ashkenazi, sephardic, or generic (default).
languageset
An array of languages to check. If not specified, then the language will be guessed. Accepts: any, common, cyrillic, english, french, german, hebrew, hungarian, polish, romanian, russian, spanish.