IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Whitespace analyzer
editWhitespace analyzer
editThe whitespace
analyzer breaks text into terms whenever it encounters a
whitespace character.
Example output
editresp = client.indices.analyze( analyzer="whitespace", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp)
response = client.indices.analyze( body: { analyzer: 'whitespace', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.analyze({ analyzer: "whitespace", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response);
POST _analyze { "analyzer": "whitespace", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The above sentence would produce the following terms:
[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]
Configuration
editThe whitespace
analyzer is not configurable.
Definition
editIt consists of:
- Tokenizer
If you need to customize the whitespace
analyzer then you need to
recreate it as a custom
analyzer and modify it, usually by adding
token filters. This would recreate the built-in whitespace
analyzer
and you can use it as a starting point for further customization:
resp = client.indices.create( index="whitespace_example", settings={ "analysis": { "analyzer": { "rebuilt_whitespace": { "tokenizer": "whitespace", "filter": [] } } } }, ) print(resp)
response = client.indices.create( index: 'whitespace_example', body: { settings: { analysis: { analyzer: { rebuilt_whitespace: { tokenizer: 'whitespace', filter: [] } } } } } ) puts response
const response = await client.indices.create({ index: "whitespace_example", settings: { analysis: { analyzer: { rebuilt_whitespace: { tokenizer: "whitespace", filter: [], }, }, }, }, }); console.log(response);