Porter stem token filter
editPorter stem token filter
editProvides algorithmic stemming for the English language, based on the Porter stemming algorithm.
This filter tends to stem more aggressively than other English
stemmer filters, such as the kstem
filter.
The porter_stem
filter is equivalent to the
stemmer
filter’s
english
variant.
The porter_stem
filter uses Lucene’s
PorterStemFilter.
Example
editThe following analyze API request uses the porter_stem
filter to stem
the foxes jumping quickly
to the fox jump quickli
:
resp = client.indices.analyze( tokenizer="standard", filter=[ "porter_stem" ], text="the foxes jumping quickly", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'porter_stem' ], text: 'the foxes jumping quickly' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: ["porter_stem"], text: "the foxes jumping quickly", }); console.log(response);
GET /_analyze { "tokenizer": "standard", "filter": [ "porter_stem" ], "text": "the foxes jumping quickly" }
The filter produces the following tokens:
[ the, fox, jump, quickli ]
Add to an analyzer
editThe following create index API request uses the
porter_stem
filter to configure a new custom
analyzer.
To work properly, the porter_stem
filter requires lowercase tokens. To ensure
tokens are lowercased, add the lowercase
filter before the porter_stem
filter in the analyzer configuration.
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'whitespace', filter: [ 'lowercase', 'porter_stem' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "whitespace", filter: ["lowercase", "porter_stem"], }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } } }