KStem token filter
editKStem token filter
editProvides KStem-based stemming for
the English language. The kstem
filter combines
algorithmic stemming with a built-in
dictionary.
The kstem
filter tends to stem less aggressively than other English stemmer
filters, such as the porter_stem
filter.
The kstem
filter is equivalent to the
stemmer
filter’s
light_english
variant.
This filter uses Lucene’s KStemFilter.
Example
editThe following analyze API request uses the kstem
filter to stem the foxes
jumping quickly
to the fox jump quick
:
resp = client.indices.analyze( tokenizer="standard", filter=[ "kstem" ], text="the foxes jumping quickly", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'kstem' ], text: 'the foxes jumping quickly' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: ["kstem"], text: "the foxes jumping quickly", }); console.log(response);
GET /_analyze { "tokenizer": "standard", "filter": [ "kstem" ], "text": "the foxes jumping quickly" }
The filter produces the following tokens:
[ the, fox, jump, quick ]
Add to an analyzer
editThe following create index API request uses the
kstem
filter to configure a new custom
analyzer.
To work properly, the kstem
filter requires lowercase tokens. To ensure tokens
are lowercased, add the lowercase
filter
before the kstem
filter in the analyzer configuration.
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "kstem" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'whitespace', filter: [ 'lowercase', 'kstem' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "whitespace", filter: ["lowercase", "kstem"], }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "kstem" ] } } } } }