Full-text search with synonyms
editFull-text search with synonyms
editSynonyms are words or phrases that have the same or similar meaning. They are an important aspect of search, as they can improve the search experience and increase the scope of search results.
Synonyms allow you to:
- Improve search relevance by finding relevant documents that use different terms to express the same concept.
- Make domain-specific vocabulary more user-friendly, allowing users to use search terms they are more familiar with.
- Define common misspellings and typos to transparently handle common mistakes.
Synonyms are grouped together using synonyms sets. You can have as many synonyms sets as you need.
In order to use synonyms sets in Elasticsearch, you need to:
- Store your synonyms set
- Configure synonyms token filters and analyzers
Store your synonyms set
editYour synonyms sets need to be stored in Elasticsearch so your analyzers can refer to them. There are two ways to store your synonyms sets:
Synonyms API
editYou can use the synonyms APIs to manage synonyms sets. This is the most flexible approach, as it allows to dynamically define and modify synonyms sets.
Changes in your synonyms sets will automatically reload the associated analyzers.
Inline
editYou can test your synonyms by adding them directly inline in your token filter definition.
Inline synonyms are not recommended for production usage. A large number of inline synonyms increases cluster size unnecessarily and can lead to performance issues.
Configure synonyms token filters and analyzers
editOnce your synonyms sets are created, you can start configuring your token filters and analyzers to use them.
Elasticsearch uses synonyms as part of the analysis process. You can use two types of token filter:
- Synonym graph token filter: It is recommended to use it, as it can correctly handle multi-word synonyms ("hurriedly", "in a hurry").
- Synonym token filter: Not recommended if you need to use multi-word synonyms.
Check each synonym token filter documentation for configuration details and instructions on adding it to an analyzer.
Test your analyzer
editYou can test an analyzer configuration without modifying your index settings. Use the analyze API to test your analyzer chain:
curl "${ES_URL}/my-index/_analyze?pretty" \ -H "Authorization: ApiKey ${API_KEY}" \ -H "Content-Type: application/json" \ -d' { "tokenizer": "standard", "filter" : [ "lowercase", { "type": "synonym_graph", "synonyms": ["pc => personal computer", "computer, pc, laptop"] } ], "text" : "Check how PC synonyms work" }
Apply synonyms at index or search time
editAnalyzers can be applied at index time or search time.
You need to decide when to apply your synonyms:
- Index time: Synonyms are applied when the documents are indexed into Elasticsearch. This is a less flexible alternative, as changes to your synonyms require reindexing.
-
Search time: Synonyms are applied when a search is executed. This is a more flexible approach, which doesn’t require reindexing. If token filters are configured with
"updateable": true
, search analyzers can be reloaded when you make changes to your synonyms.
Synonyms sets created using the synonyms API can only be used at search time.
You can specify the analyzer that contains your synonyms set as a search time analyzer or as an index time analyzer.
The following example adds my_analyzer
as a search analyzer to the title
field in an index mapping:
"mappings": { "properties": { "title": { "type": "text", "search_analyzer": "my_analyzer", "updateable": true } } }