Keyword tokenizer
editKeyword tokenizer
editThe keyword
tokenizer is a “noop” tokenizer that accepts whatever text it
is given and outputs the exact same text as a single term. It can be combined
with token filters to normalise output, e.g. lower-casing email addresses.
Example output
editresp = client.indices.analyze( tokenizer="keyword", text="New York", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'keyword', text: 'New York' } ) puts response
const response = await client.indices.analyze({ tokenizer: "keyword", text: "New York", }); console.log(response);
POST _analyze { "tokenizer": "keyword", "text": "New York" }
The above sentence would produce the following term:
[ New York ]
Combine with token filters
editYou can combine the keyword
tokenizer with token filters to normalise
structured data, such as product IDs or email addresses.
For example, the following analyze API request uses the
keyword
tokenizer and lowercase
filter to
convert an email address to lowercase.
resp = client.indices.analyze( tokenizer="keyword", filter=[ "lowercase" ], text="john.SMITH@example.COM", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'keyword', filter: [ 'lowercase' ], text: 'john.SMITH@example.COM' } ) puts response
const response = await client.indices.analyze({ tokenizer: "keyword", filter: ["lowercase"], text: "john.SMITH@example.COM", }); console.log(response);
POST _analyze { "tokenizer": "keyword", "filter": [ "lowercase" ], "text": "john.SMITH@example.COM" }
The request produces the following token:
[ john.smith@example.com ]
Configuration
editThe keyword
tokenizer accepts the following parameters:
|
The number of characters read into the term buffer in a single pass.
Defaults to |