kuromoji_part_of_speech token filter
editkuromoji_part_of_speech
token filter
editThe kuromoji_part_of_speech
token filter removes tokens that match a set of
part-of-speech tags. It accepts the following setting:
-
stoptags
-
An array of part-of-speech tags that should be removed. It defaults to the
stoptags.txt
file embedded in thelucene-analyzer-kuromoji.jar
.
For example:
PUT kuromoji_sample { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "kuromoji_tokenizer", "filter": [ "my_posfilter" ] } }, "filter": { "my_posfilter": { "type": "kuromoji_part_of_speech", "stoptags": [ "助詞-格助詞-一般", "助詞-終助詞" ] } } } } } } GET kuromoji_sample/_analyze { "analyzer": "my_analyzer", "text": "寿司がおいしいね" }
Which responds with:
{ "tokens" : [ { "token" : "寿司", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 0 }, { "token" : "おいしい", "start_offset" : 3, "end_offset" : 7, "type" : "word", "position" : 2 } ] }