Elision token filter
editElision token filter
editRemoves specified elisions from
the beginning of tokens. For example, you can use this filter to change
l'avion
to avion
.
When not customized, the filter removes the following French elisions by default:
l'
, m'
, t'
, qu'
, n'
, s'
, j'
, d'
, c'
, jusqu'
, quoiqu'
,
lorsqu'
, puisqu'
Customized versions of this filter are included in several of Elasticsearch’s built-in language analyzers:
This filter uses Lucene’s ElisionFilter.
Example
editThe following analyze API request uses the elision
filter to remove j'
from j’examine près du wharf
:
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'elision' ], text: 'j’examine près du wharf' } ) puts response
GET _analyze { "tokenizer" : "standard", "filter" : ["elision"], "text" : "j’examine près du wharf" }
The filter produces the following tokens:
[ examine, près, du, wharf ]
Add to an analyzer
editThe following create index API request uses the
elision
filter to configure a new
custom analyzer.
response = client.indices.create( index: 'elision_example', body: { settings: { analysis: { analyzer: { whitespace_elision: { tokenizer: 'whitespace', filter: [ 'elision' ] } } } } } ) puts response
PUT /elision_example { "settings": { "analysis": { "analyzer": { "whitespace_elision": { "tokenizer": "whitespace", "filter": [ "elision" ] } } } } }
Configurable parameters
edit-
articles
-
(Required*, array of string) List of elisions to remove.
To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.
For custom
elision
filters, either this parameter orarticles_path
must be specified. -
articles_path
-
(Required*, string) Path to a file that contains a list of elisions to remove.
This path must be absolute or relative to the
config
location, and the file must be UTF-8 encoded. Each elision in the file must be separated by a line break.To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.
For custom
elision
filters, either this parameter orarticles
must be specified. -
articles_case
-
(Optional, Boolean)
If
true
, elision matching is case insensitive. Iffalse
, elision matching is case sensitive. Defaults tofalse
.
Customize
editTo customize the elision
filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following request creates a custom case-insensitive elision
filter that removes the l'
, m'
, t'
, qu'
, n'
, s'
,
and j'
elisions:
response = client.indices.create( index: 'elision_case_insensitive_example', body: { settings: { analysis: { analyzer: { default: { tokenizer: 'whitespace', filter: [ 'elision_case_insensitive' ] } }, filter: { elision_case_insensitive: { type: 'elision', articles: [ 'l', 'm', 't', 'qu', 'n', 's', 'j' ], articles_case: true } } } } } ) puts response
PUT /elision_case_insensitive_example { "settings": { "analysis": { "analyzer": { "default": { "tokenizer": "whitespace", "filter": [ "elision_case_insensitive" ] } }, "filter": { "elision_case_insensitive": { "type": "elision", "articles": [ "l", "m", "t", "qu", "n", "s", "j" ], "articles_case": true } } } } }