IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
HTML Strip Char Filter
editHTML Strip Char Filter
editThe html_strip
character filter strips HTML elements from the text and
replaces HTML entities with their decoded value (e.g. replacing &
with
&
).
Example output
editPOST _analyze { "tokenizer": "keyword", "char_filter": [ "html_strip" ], "text": "<p>I'm so <b>happy</b>!</p>" }
The |
The above example returns the term:
[ \nI'm so happy!\n ]
The same example with the standard
tokenizer would return the following terms:
[ I'm, so, happy ]
Configuration
editThe html_strip
character filter accepts the following parameter:
|
An array of HTML tags which should not be stripped from the original text. |
Example configuration
editIn this example, we configure the html_strip
character filter to leave <b>
tags in place:
PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "keyword", "char_filter": ["my_char_filter"] } }, "char_filter": { "my_char_filter": { "type": "html_strip", "escaped_tags": ["b"] } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "<p>I'm so <b>happy</b>!</p>" }
The above example produces the following term:
[ \nI'm so <b>happy</b>!\n ]