WARNING: Version 6.0 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Minhash Token Filter
editMinhash Token Filter
editA token filter of type min_hash
hashes each token of the token stream and divides
the resulting hashes into buckets, keeping the lowest-valued hashes per
bucket. It then returns these hashes as tokens.
The following are settings that can be set for a min_hash
token filter.
Setting | Description |
---|---|
|
The number of hashes to hash the token stream with. Defaults to |
|
The number of buckets to divide the minhashes into. Defaults to |
|
The number of minhashes to keep per bucket. Defaults to |
|
Whether or not to fill empty buckets with the value of the first non-empty
bucket to its circular right. Only takes effect if hash_set_size is equal to one.
Defaults to |