New

The executive guide to generative AI

Read more

Conditional token filter

edit

Applies a set of token filters to tokens that match conditions in a provided predicate script.

This filter uses Lucene’s ConditionalTokenFilter.

Example

edit

The following analyze API request uses the condition filter to match tokens with fewer than 5 characters in THE QUICK BROWN FOX. It then applies the lowercase filter to those matching tokens, converting them to lowercase.

resp = client.indices.analyze(
    tokenizer="standard",
    filter=[
        {
            "type": "condition",
            "filter": [
                "lowercase"
            ],
            "script": {
                "source": "token.getTerm().length() < 5"
            }
        }
    ],
    text="THE QUICK BROWN FOX",
)
print(resp)
response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
      {
        type: 'condition',
        filter: [
          'lowercase'
        ],
        script: {
          source: 'token.getTerm().length() < 5'
        }
      }
    ],
    text: 'THE QUICK BROWN FOX'
  }
)
puts response
const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: [
    {
      type: "condition",
      filter: ["lowercase"],
      script: {
        source: "token.getTerm().length() < 5",
      },
    },
  ],
  text: "THE QUICK BROWN FOX",
});
console.log(response);
GET /_analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "condition",
      "filter": [ "lowercase" ],
      "script": {
        "source": "token.getTerm().length() < 5"
      }
    }
  ],
  "text": "THE QUICK BROWN FOX"
}

The filter produces the following tokens:

[ the, QUICK, BROWN, fox ]

Configurable parameters

edit
filter

(Required, array of token filters) Array of token filters. If a token matches the predicate script in the script parameter, these filters are applied to the token in the order provided.

These filters can include custom token filters defined in the index mapping.

script

(Required, script object) Predicate script used to apply token filters. If a token matches this script, the filters in the filter parameter are applied to the token.

For valid parameters, see How to write scripts. Only inline scripts are supported. Painless scripts are executed in the analysis predicate context and require a token property.

Customize and add to an analyzer

edit

To customize the condition filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following create index API request uses a custom condition filter to configure a new custom analyzer. The custom condition filter matches the first token in a stream. It then reverses that matching token using the reverse filter.

resp = client.indices.create(
    index="palindrome_list",
    settings={
        "analysis": {
            "analyzer": {
                "whitespace_reverse_first_token": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "reverse_first_token"
                    ]
                }
            },
            "filter": {
                "reverse_first_token": {
                    "type": "condition",
                    "filter": [
                        "reverse"
                    ],
                    "script": {
                        "source": "token.getPosition() === 0"
                    }
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'palindrome_list',
  body: {
    settings: {
      analysis: {
        analyzer: {
          whitespace_reverse_first_token: {
            tokenizer: 'whitespace',
            filter: [
              'reverse_first_token'
            ]
          }
        },
        filter: {
          reverse_first_token: {
            type: 'condition',
            filter: [
              'reverse'
            ],
            script: {
              source: 'token.getPosition() === 0'
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "palindrome_list",
  settings: {
    analysis: {
      analyzer: {
        whitespace_reverse_first_token: {
          tokenizer: "whitespace",
          filter: ["reverse_first_token"],
        },
      },
      filter: {
        reverse_first_token: {
          type: "condition",
          filter: ["reverse"],
          script: {
            source: "token.getPosition() === 0",
          },
        },
      },
    },
  },
});
console.log(response);
PUT /palindrome_list
{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_reverse_first_token": {
          "tokenizer": "whitespace",
          "filter": [ "reverse_first_token" ]
        }
      },
      "filter": {
        "reverse_first_token": {
          "type": "condition",
          "filter": [ "reverse" ],
          "script": {
            "source": "token.getPosition() === 0"
          }
        }
      }
    }
  }
}
Was this helpful?
Feedback