WARNING: Version 5.0 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Fingerprint Token Filter
editFingerprint Token Filter
editThe fingerprint
token filter emits a single token which is useful for fingerprinting
a body of text, and/or providing a token that can be clustered on. It does this by
sorting the tokens, deduplicating and then concatenating them back into a single token.
For example, the tokens ["the", "quick", "quick", "brown", "fox", "was", "very", "brown"]
will be
transformed into a single token: "brown fox quick the very was"
. Notice how the tokens were sorted
alphabetically, and there is only one "quick"
.
The following are settings that can be set for a fingerprint
token
filter type:
Setting | Description |
---|---|
|
Defaults to a space. |
|
Defaults to |
Maximum token size
editBecause a field may have many unique tokens, it is important to set a cutoff so that fields do not grow
too large. The max_output_size
setting controls this behavior. If the concatenated fingerprint
grows larger than max_output_size
, the token filter will exit and will not emit a token (e.g. the
field will be empty).