Analyzers

edit

Analyzers are composed of a single Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one or more CharFilters. The analysis module allows you to register Analyzers under logical names which can then be referenced either in mapping definitions or in certain APIs.

Elasticsearch comes with a number of prebuilt analyzers which are ready to use. Alternatively, you can combine the built in character filters, tokenizers and token filters to create custom analyzers.

Default Analyzers

edit

An analyzer is registered under a logical name. It can then be referenced from mapping definitions or certain APIs. When none are defined, defaults are used. There is an option to define which analyzers will be used by default when none can be derived.

The default logical name allows one to configure an analyzer that will be used both for indexing and for searching APIs. The default_search logical name can be used to configure a default analyzer that will be used just when searching (the default analyzer would still be used for indexing).

For instance, the following settings could be used to perform exact matching only by default:

index :
  analysis :
    analyzer :
      default :
        tokenizer : keyword

Aliasing Analyzers

edit

Analyzers can be aliased to have several registered lookup names associated with them. For example, the following will allow the standard analyzer to also be referenced with alias1 and alias2 values.

index :
  analysis :
    analyzer :
      standard :
        alias: [alias1, alias2]
        type : standard
        stopwords : [test1, test2, test3]

Below is a list of the built in analyzers.