This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
nori_part_of_speech token filter
editnori_part_of_speech
token filter
editThe nori_part_of_speech
token filter removes tokens that match a set of
part-of-speech tags. The list of supported tags and their meanings can be found here:
Part of speech tags
It accepts the following setting:
-
stoptags
- An array of part-of-speech tags that should be removed.
and defaults to:
"stoptags": [ "E", "IC", "J", "MAG", "MAJ", "MM", "SP", "SSC", "SSO", "SC", "SE", "XPN", "XSA", "XSN", "XSV", "UNA", "NA", "VSV" ]
For example:
PUT nori_sample { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "nori_tokenizer", "filter": [ "my_posfilter" ] } }, "filter": { "my_posfilter": { "type": "nori_part_of_speech", "stoptags": [ "NR" ] } } } } } } GET nori_sample/_analyze { "analyzer": "my_analyzer", "text": "여섯 용이" }
Which responds with:
{ "tokens" : [ { "token" : "용", "start_offset" : 3, "end_offset" : 4, "type" : "word", "position" : 1 }, { "token" : "이", "start_offset" : 4, "end_offset" : 5, "type" : "word", "position" : 2 } ] }