Full-text search with synonyms

edit

Full-text search with synonyms

edit

Synonyms are words or phrases that have the same or similar meaning. They are an important aspect of search, as they can improve the search experience and increase the scope of search results.

Synonyms allow you to:

  • Improve search relevance by finding relevant documents that use different terms to express the same concept.
  • Make domain-specific vocabulary more user-friendly, allowing users to use search terms they are more familiar with.
  • Define common misspellings and typos to transparently handle common mistakes.

Synonyms are grouped together using synonyms sets. You can have as many synonyms sets as you need.

In order to use synonyms sets in Elasticsearch, you need to:

  • Store your synonyms set
  • Configure synonyms token filters and analyzers
Store your synonyms set
edit

Your synonyms sets need to be stored in Elasticsearch so your analyzers can refer to them. There are two ways to store your synonyms sets:

Synonyms API
edit

You can use the synonyms APIs to manage synonyms sets. This is the most flexible approach, as it allows to dynamically define and modify synonyms sets.

Changes in your synonyms sets will automatically reload the associated analyzers.

Inline
edit

You can test your synonyms by adding them directly inline in your token filter definition.

Inline synonyms are not recommended for production usage. A large number of inline synonyms increases cluster size unnecessarily and can lead to performance issues.

Configure synonyms token filters and analyzers
edit

Once your synonyms sets are created, you can start configuring your token filters and analyzers to use them.

Elasticsearch uses synonyms as part of the analysis process. You can use two types of token filter:

Check each synonym token filter documentation for configuration details and instructions on adding it to an analyzer.

Test your analyzer
edit

You can test an analyzer configuration without modifying your index settings. Use the analyze API to test your analyzer chain:

curl "${ES_URL}/my-index/_analyze?pretty" \
-H "Authorization: ApiKey ${API_KEY}" \
-H "Content-Type: application/json" \
-d'
{
  "tokenizer": "standard",
  "filter" : [
    "lowercase",
    {
      "type": "synonym_graph",
      "synonyms": ["pc => personal computer", "computer, pc, laptop"]
    }
  ],
  "text" : "Check how PC synonyms work"
}
Apply synonyms at index or search time
edit

Analyzers can be applied at index time or search time.

You need to decide when to apply your synonyms:

  • Index time: Synonyms are applied when the documents are indexed into Elasticsearch. This is a less flexible alternative, as changes to your synonyms require reindexing.
  • Search time: Synonyms are applied when a search is executed. This is a more flexible approach, which doesn’t require reindexing. If token filters are configured with "updateable": true, search analyzers can be reloaded when you make changes to your synonyms.

Synonyms sets created using the synonyms API can only be used at search time.

You can specify the analyzer that contains your synonyms set as a search time analyzer or as an index time analyzer.

The following example adds my_analyzer as a search analyzer to the title field in an index mapping:

  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "search_analyzer": "my_analyzer",
        "updateable": true
      }
    }
  }