Keep types token filter


Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes to quick foxes by keeping only <ALPHANUM> (alphanumeric) tokens.

Token types

Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.

For example, the standard tokenizer can produce a variety of token types, including <ALPHANUM>, <HANGUL>, and <NUM>. Simpler analyzers, like the lowercase tokenizer, only produce the word token type.

Certain token filters can also add token types. For example, the synonym filter can add the <SYNONYM> token type.

Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.

This filter uses Lucene’s TypeTokenFilter.

Include example


The following analyze API request uses the keep_types filter to keep only <NUM> (numeric) tokens from 1 quick fox 2 lazy dogs.

resp = client.indices.analyze(
            "type": "keep_types",
            "types": [
    text="1 quick fox 2 lazy dogs",
response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
        type: 'keep_types',
        types: [
    text: '1 quick fox 2 lazy dogs'
puts response
const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: [
      type: "keep_types",
      types: ["<NUM>"],
  text: "1 quick fox 2 lazy dogs",
GET _analyze
  "tokenizer": "standard",
  "filter": [
      "type": "keep_types",
      "types": [ "<NUM>" ]
  "text": "1 quick fox 2 lazy dogs"

The filter produces the following tokens:

[ 1, 2 ]

Exclude example


The following analyze API request uses the keep_types filter to remove <NUM> tokens from 1 quick fox 2 lazy dogs. Note the mode parameter is set to exclude.

resp = client.indices.analyze(
            "type": "keep_types",
            "types": [
            "mode": "exclude"
    text="1 quick fox 2 lazy dogs",
response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
        type: 'keep_types',
        types: [
        mode: 'exclude'
    text: '1 quick fox 2 lazy dogs'
puts response
const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: [
      type: "keep_types",
      types: ["<NUM>"],
      mode: "exclude",
  text: "1 quick fox 2 lazy dogs",
GET _analyze
  "tokenizer": "standard",
  "filter": [
      "type": "keep_types",
      "types": [ "<NUM>" ],
      "mode": "exclude"
  "text": "1 quick fox 2 lazy dogs"

The filter produces the following tokens:

[ quick, fox, lazy, dogs ]

Configurable parameters

(Required, array of strings) List of token types to keep or remove.

(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:

(Default) Keep only the specified token types.
Remove the specified token types.

Customize and add to an analyzer


To customize the keep_types filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following create index API request uses a custom keep_types filter to configure a new custom analyzer. The custom keep_types filter keeps only <ALPHANUM> (alphanumeric) tokens.

resp = client.indices.create(
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
            "filter": {
                "extract_alpha": {
                    "type": "keep_types",
                    "types": [
response = client.indices.create(
  index: 'keep_types_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          my_analyzer: {
            tokenizer: 'standard',
            filter: [
        filter: {
          extract_alpha: {
            type: 'keep_types',
            types: [
puts response
const response = await client.indices.create({
  index: "keep_types_example",
  settings: {
    analysis: {
      analyzer: {
        my_analyzer: {
          tokenizer: "standard",
          filter: ["extract_alpha"],
      filter: {
        extract_alpha: {
          type: "keep_types",
          types: ["<ALPHANUM>"],
PUT keep_types_example
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [ "extract_alpha" ]
      "filter": {
        "extract_alpha": {
          "type": "keep_types",
          "types": [ "<ALPHANUM>" ]