IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Minhash Token Filter
editMinhash Token Filter
editA token filter of type min_hash
hashes each token of the token stream and divides
the resulting hashes into buckets, keeping the lowest-valued hashes per
bucket. It then returns these hashes as tokens.
The following are settings that can be set for a min_hash
token filter.
Setting | Description |
---|---|
|
The number of hashes to hash the token stream with. Defaults to |
|
The number of buckets to divide the minhashes into. Defaults to |
|
The number of minhashes to keep per bucket. Defaults to |
|
Whether or not to fill empty buckets with the value of the first non-empty
bucket to its circular right. Only takes effect if hash_set_size is equal to one.
Defaults to |