WARNING: The 1.x versions of Elasticsearch have passed their EOL dates. If you are running a 1.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
All About Caching
editAll About Caching
editEarlier in this chapter (Internal Filter Operation), we briefly discussed how filters are calculated. At their heart is a bitset representing which documents match the filter. Elasticsearch aggressively caches these bitsets for later use. Once cached, these bitsets can be reused wherever the same filter is used, without having to reevaluate the entire filter again.
These cached bitsets are “smart”: they are updated incrementally. As you index new documents, only those new documents need to be added to the existing bitsets, rather than having to recompute the entire cached filter over and over. Filters are real-time like the rest of the system; you don’t need to worry about cache expiry.
Independent Filter Caching
editEach filter is calculated and cached independently, regardless of where it is used. If two different queries use the same filter, the same filter bitset will be reused. Likewise, if a single query uses the same filter in multiple places, only one bitset is calculated and then reused.
Let’s look at this example query, which looks for emails that are either of the following:
- In the inbox and have not been read
- Not in the inbox but have been marked as important
"bool": { "should": [ { "bool": { "must": [ { "term": { "folder": "inbox" }}, { "term": { "read": false }} ] }}, { "bool": { "must_not": { "term": { "folder": "inbox" } }, "must": { "term": { "important": true } } }} ] }
Even though one of the inbox clauses is a must
clause and the other is a
must_not
clause, the two clauses themselves are identical. This means that
the bitset is calculated once for the first clause that is executed, and then
the cached bitset is used for the other clause. By the time this query is run
a second time, the inbox filter is already cached and so both clauses will use
the cached bitset.
This ties in nicely with the composability of the query DSL. It is easy to move filters around, or reuse the same filter in multiple places within the same query. This isn’t just convenient to the developer—it has direct performance benefits.
Controlling Caching
editMost leaf filters—those dealing directly with fields like the term
filter—are cached, while compound filters, like the bool
filter, are not.
Leaf filters have to consult the inverted index on disk, so it makes sense to cache them. Compound filters, on the other hand, use fast bit logic to combine the bitsets resulting from their inner clauses, so it is efficient to recalculate them every time.
Certain leaf filters, however, are not cached by default, because it doesn’t make sense to do so:
- Script filters
- The results from https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-script-filter.html cannot be cached because the meaning of the script is opaque to Elasticsearch.
- Geo-filters
- The geolocation filters, which we cover in more detail in Geolocation, are usually used to filter results based on the geolocation of a specific user. Since each user has a unique geolocation, it is unlikely that geo-filters will be reused, so it makes no sense to cache them.
- Date ranges
-
Date ranges that use the
now
function (for example"now-1h"
), result in values accurate to the millisecond. Every time the filter is run,now
returns a new time. Older filters will never be reused, so caching is disabled by default. However, when usingnow
with rounding (for example,now/d
rounds to the nearest day), caching is enabled by default.
Sometimes the default caching strategy is not correct. Perhaps you have a
complicated bool
expression that is reused several times in the same query.
Or you have a filter on a date
field that will never be reused. The default
caching strategy can be overridden on almost any filter by setting the
_cache
flag:
Later chapters provide examples of when it can make sense to override the default caching strategy.