WARNING: Version 5.0 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
HTML Strip Char Filter
editHTML Strip Char Filter
editThe html_strip
character filter strips HTML elements from the text and
replaces HTML entities with their decoded value (e.g. replacing &
with
&
).
Example output
editPOST _analyze { "tokenizer": "keyword", "char_filter": [ "html_strip" ], "text": "<p>I'm so <b>happy</b>!</p>" }
The |
The above example returns the term:
[ \nI'm so happy!\n ]
The same example with the standard
tokenizer would return the following terms:
[ I'm, so, happy ]
Configuration
editThe html_strip
character filter accepts the following parameter:
|
An array of HTML tags which should not be stripped from the original text. |
Example configuration
editIn this example, we configure the html_strip
character filter to leave <b>
tags in place:
PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "keyword", "char_filter": ["my_char_filter"] } }, "char_filter": { "my_char_filter": { "type": "html_strip", "escaped_tags": ["b"] } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "<p>I'm so <b>happy</b>!</p>" }
The above example produces the following term:
[ \nI'm so <b>happy</b>!\n ]