This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.

« Keyword repeat token filter Length token filter »

› › ›

KStem token filter

edit

KStem token filter

edit

Provides KStem-based stemming for the English language. The kstem filter combines algorithmic stemming with a built-in dictionary.

The kstem filter tends to stem less aggressively than other English stemmer filters, such as the porter_stem filter.

The kstem filter is equivalent to the stemmer filter’s light_english variant.

This filter uses Lucene’s KStemFilter.

Example

edit

The following analyze API request uses the kstem filter to stem the foxes jumping quickly to the fox jump quick:

resp = client.indices.analyze(
    tokenizer="standard",
    filter=[
        "kstem"
    ],
    text="the foxes jumping quickly",
)
print(resp)

response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
      'kstem'
    ],
    text: 'the foxes jumping quickly'
  }
)
puts response

const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["kstem"],
  text: "the foxes jumping quickly",
});
console.log(response);

GET /_analyze
{
  "tokenizer": "standard",
  "filter": [ "kstem" ],
  "text": "the foxes jumping quickly"
}

The filter produces the following tokens:

[ the, fox, jump, quick ]

Add to an analyzer

edit

The following create index API request uses the kstem filter to configure a new custom analyzer.

To work properly, the kstem filter requires lowercase tokens. To ensure tokens are lowercased, add the lowercase filter before the kstem filter in the analyzer configuration.

resp = client.indices.create(
    index="my-index-000001",
    settings={
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase",
                        "kstem"
                    ]
                }
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    settings: {
      analysis: {
        analyzer: {
          my_analyzer: {
            tokenizer: 'whitespace',
            filter: [
              'lowercase',
              'kstem'
            ]
          }
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
    analysis: {
      analyzer: {
        my_analyzer: {
          tokenizer: "whitespace",
          filter: ["lowercase", "kstem"],
        },
      },
    },
  },
});
console.log(response);

PUT /my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "kstem"
          ]
        }
      }
    }
  }
}

« Keyword repeat token filter Length token filter »