Conditional Token Filter

edit

The conditional token filter takes a predicate script and a list of subfilters, and only applies the subfilters to the current token if it matches the predicate.

Options

edit

filter

a chain of token filters to apply to the current token if the predicate matches. These can be any token filters defined elsewhere in the index mappings.

script

a predicate script that determines whether or not the filters will be applied to the current token. Note that only inline scripts are supported

Settings example

edit

You can set it up like:

PUT /condition_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : [ "my_condition" ]
                }
            },
            "filter" : {
                "my_condition" : {
                    "type" : "condition",
                    "filter" : [ "lowercase" ],
                    "script" : {
                        "source" : "token.getTerm().length() < 5"  
                    }
                }
            }
        }
    }
}

This will only apply the lowercase filter to terms that are less than 5 characters in length

And test it like:

POST /condition_example/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "What Flapdoodle"
}

And it’d respond:

{
  "tokens": [
    {
      "token": "what",              
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "Flapdoodle",        
      "start_offset": 5,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

The term What has been lowercased, because it is only 4 characters long

The term Flapdoodle has been left in its original case, because it doesn’t pass the predicate