query Section

edit

The query section contains detailed timing of the query tree executed by Lucene on a particular shard. The overall structure of this query tree will resemble your original Elasticsearch query, but may be slightly (or sometimes very) different. It will also use similar but not always identical naming. Using our previous term query example, let’s analyze the query section:

"query": [
    {
       "query_type": "BooleanQuery",
       "lucene": "message:search message:test",
       "time": "15.52889800ms",
       "breakdown": {...},               
       "children": [
          {
             "query_type": "TermQuery",
             "lucene": "message:search",
             "time": "4.938855000ms",
             "breakdown": {...}
          },
          {
             "query_type": "TermQuery",
             "lucene": "message:test",
             "time": "0.5016660000ms",
             "breakdown": {...}
          }
       ]
    }
]

The breakdown timings are omitted for simplicity

Based on the profile structure, we can see that our match query was rewritten by Lucene into a BooleanQuery with two clauses (both holding a TermQuery). The "query_type" field displays the Lucene class name, and often aligns with the equivalent name in Elasticsearch. The "lucene" field displays the Lucene explanation text for the query, and is made available to help differentiating between parts of your query (e.g. both "message:search" and "message:test" are TermQuery’s and would appear identical otherwise.

The "time" field shows that this query took ~15ms for the entire BooleanQuery to execute. The recorded time is inclusive of all children.

The "breakdown" field will give detailed stats about how the time was spent, we’ll look at that in a moment. Finally, the "children" array lists any sub-queries that may be present. Because we searched for two values ("search test"), our BooleanQuery holds two children TermQueries. They have identical information (query_type, time, breakdown, etc). Children are allowed to have their own children.

Timing Breakdown

edit

The breakdown component lists detailed timing statistics about low-level Lucene execution:

"breakdown": {
  "score": 0,
  "next_doc": 24495,
  "match": 0,
  "create_weight": 8488388,
  "build_scorer": 7016015,
  "advance": 0

}

Timings are listed in wall-clock nanoseconds and are not normalized at all. All caveats about the overall time apply here. The intention of the breakdown is to give you a feel for A) what machinery in Lucene is actually eating time, and B) the magnitude of differences in times between the various components. Like the overall time, the breakdown is inclusive of all children times.

The meaning of the stats are as follows:

All parameters:

edit

create_weight

A Query in Lucene must be capable of reuse across multiple IndexSearchers (think of it as the engine that executes a search against a specific Lucene Index). This puts Lucene in a tricky spot, since many queries need to accumulate temporary state/statistics associated with the index it is being used against, but the Query contract mandates that it must be immutable.

To get around this, Lucene asks each query to generate a Weight object which acts as a temporary context object to hold state associated with this particular (IndexSearcher, Query) tuple. The weight metric shows how long this process takes

build_scorer

This parameter shows how long it takes to build a Scorer for the query. A Scorer is the mechanism that iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?). Note, this records the time required to generate the Scorer object, not actually score the documents. Some queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.

This may also showing timing associated with caching, if enabled and/or applicable for the query

next_doc

The Lucene method next_doc returns Doc ID of the next document matching the query. This statistic shows the time it takes to determine which document is the next match, a process that varies considerably depending on the nature of the query. Next_doc is a specialized form of advance() which is more convenient for many queries in Lucene. It is equivalent to advance(docId() + 1)

advance

advance is the "lower level" version of next_doc: it serves the same purpose of finding the next matching doc, but requires the calling query to perform extra tasks such as identifying and moving past skips, etc. However, not all queries can use next_doc, so advance is also timed for those queries.

Conjunctions (e.g. must clauses in a boolean) are typical consumers of advance

matches

Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is "approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous (and expensive) process. The second phase verification is what the matches statistic measures.

For example, a phrase query first checks a document approximately by ensuring all terms in the phrase are present in the doc. If all the terms are present, it then executes the second phase verification to ensure the terms are in-order to form the phrase, which is relatively more expensive than just checking for presence of the terms.

Because this two-phase process is only used by a handful of queries, the metric statistic will often be zero

score

This records the time taken to score a particular document via it’s Scorer