This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.
Simple analyzer
editSimple analyzer
editThe simple
analyzer breaks text into tokens at any non-letter character, such
as numbers, spaces, hyphens and apostrophes, discards non-letter characters,
and changes uppercase to lowercase.
Example
editresp = client.indices.analyze( analyzer="simple", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp)
response = client.indices.analyze( body: { analyzer: 'simple', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.analyze({ analyzer: "simple", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response);
POST _analyze { "analyzer": "simple", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The simple
analyzer parses the sentence and produces the following
tokens:
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
Customize
editTo customize the simple
analyzer, duplicate it to create the basis for
a custom analyzer. This custom analyzer can be modified as required, usually by
adding token filters.
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_custom_simple_analyzer": { "tokenizer": "lowercase", "filter": [] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: 'lowercase', filter: [] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: "lowercase", filter: [], }, }, }, }, }); console.log(response);