NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.
Shingle Token Filter
editShingle Token Filter
editShingles are generally used to help speed up phrase queries. Rather
than building filter chains by hand, you may find it easier to use the
index-phrases
option on a text field.
A token filter of type shingle
that constructs shingles (token
n-grams) from a token stream. In other words, it creates combinations of
tokens as a single token. For example, the sentence "please divide this
sentence into shingles" might be tokenized into shingles "please
divide", "divide this", "this sentence", "sentence into", and "into
shingles".
This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0.
The following are settings that can be set for a shingle
token filter
type:
Setting | Description |
---|---|
|
The maximum shingle size. Defaults to |
|
The minimum shingle size. Defaults to |
|
If |
|
If |
|
The string to use when joining adjacent tokens to
form a shingle. Defaults to |
|
The string to use as a replacement for each position
at which there is no actual token in the stream. For instance this string is
used if the position increment is greater than one when a |
The index level setting index.max_shingle_diff
controls the maximum allowed
difference between max_shingle_size
and min_shingle_size
.