- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing Elasticsearch
- Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment
WARNING: The 1.x versions of Elasticsearch have passed their EOL dates. If you are running a 1.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Fuzzy Query
editFuzzy Query
editThe fuzzy
query is the fuzzy equivalent of
the term
query. You will seldom use it directly yourself, but understanding
how it works will help you to use fuzziness in the higher-level match
query.
To understand how it works, we will first index some documents:
POST /my_index/my_type/_bulk { "index": { "_id": 1 }} { "text": "Surprise me!"} { "index": { "_id": 2 }} { "text": "That was surprising."} { "index": { "_id": 3 }} { "text": "I wasn't surprised."}
Now we can run a fuzzy
query for the term surprize
:
GET /my_index/my_type/_search { "query": { "fuzzy": { "text": "surprize" } } }
The fuzzy
query is a term-level query, so it doesn’t do any analysis. It
takes a single term and finds all terms in the term dictionary that are
within the specified fuzziness
. The default fuzziness
is AUTO
.
In our example, surprize
is within an edit distance of 2 from both
surprise
and surprised
, so documents 1 and 3 match. We could reduce the
matches to just surprise
with the following query:
GET /my_index/my_type/_search { "query": { "fuzzy": { "text": { "value": "surprize", "fuzziness": 1 } } } }
Improving Performance
editThe fuzzy
query works by taking the original term and building a
Levenshtein automaton—like a big graph representing all the strings
that are within the specified edit distance of the original string.
The fuzzy query then uses the automaton to step efficiently through all of the terms in the term dictionary to see if they match. Once it has collected all of the matching terms that exist in the term dictionary, it can compute the list of matching documents.
Of course, depending on the type of data stored in the index, a fuzzy query with an edit distance of 2 can match a very large number of terms and perform very badly. Two parameters can be used to limit the performance impact:
-
prefix_length
-
The number of initial characters that will not be “fuzzified.” Most
spelling errors occur toward the end of the word, not toward the beginning.
By using a
prefix_length
of3
, for example, you can signficantly reduce the number of matching terms. -
max_expansions
-
If a fuzzy query expands to three or four fuzzy options, the new options may be
meaningful. If it produces 1,000 options, they are essentially
meaningless. Use
max_expansions
to limit the total number of options that will be produced. The fuzzy query will collect matching terms until it runs out of terms or reaches themax_expansions
limit.
On this page