- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment
WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Term-Based Versus Full-Text
editTerm-Based Versus Full-Text
editWhile all queries perform some sort of relevance calculation, not all queries
have an analysis phase. Besides specialized queries like the bool
or
function_score
queries, which don’t operate on text at all, textual queries can
be broken down into two families:
- Term-based queries
-
Queries like the
term
orfuzzy
queries are low-level queries that have no analysis phase. They operate on a single term. Aterm
query for the termFoo
looks for that exact term in the inverted index and calculates the TF/IDF relevance_score
for each document that contains the term.It is important to remember that the
term
query looks in the inverted index for the exact term only; it won’t match any variants likefoo
orFOO
. It doesn’t matter how the term came to be in the index, just that it is. If you were to index["Foo","Bar"]
into an exact valuenot_analyzed
field, orFoo Bar
into an analyzed field with thewhitespace
analyzer, both would result in having the two termsFoo
andBar
in the inverted index. - Full-text queries
-
Queries like the
match
orquery_string
queries are high-level queries that understand the mapping of a field:-
If you use them to query a
date
orinteger
field, they will treat the query string as a date or integer, respectively. -
If you query an exact value (
not_analyzed
) string field, they will treat the whole query string as a single term. -
But if you query a full-text (
analyzed
) field, they will first pass the query string through the appropriate analyzer to produce the list of terms to be queried.
Once the query has assembled a list of terms, it executes the appropriate low-level query for each of these terms, and then combines their results to produce the final relevance score for each document.
We will discuss this process in more detail in the following chapters.
-
If you use them to query a
You seldom need to use the term-based queries directly. Usually you want to query full text, not individual terms, and this is easier to do with the high-level full-text queries (which end up using term-based queries internally).
If you do find yourself wanting to use a query on an exact value
not_analyzed
field, think
about whether you really want a scoring query, or if a non-scoring query might be better.
Single-term queries usually represent binary yes/no questions and are almost always better expressed as a filter, so that they can benefit from caching:
GET /_search { "query": { "constant_score": { "filter": { "term": { "gender": "female" } } } } }