Understanding MultiTermQuery output

edit

Understanding MultiTermQuery output

edit

A special note needs to be made about the MultiTermQuery class of queries. This includes wildcards, regex and fuzzy queries. These queries emit very verbose responses, and are not overly structured.

Essentially, these queries rewrite themselves on a per-segment basis. If you imagine the wildcard query b*, it technically can match any token that begins with the letter "b". It would be impossible to enumerate all possible combinations, so Lucene rewrites the query in context of the segment being evaluated. E.g. one segment may contain the tokens [bar, baz], so the query rewrites to a BooleanQuery combination of "bar" and "baz". Another segment may only have the token [bakery], so query rewrites to a single TermQuery for "bakery".

Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean "lineage" showing how one query rewrites into the next. At present time, all we can do is apologize, and suggest you collapse the details for that query’s children if it is too confusing. Luckily, all the timing statistics are correct, just not the physical layout in the response, so it is sufficient to just analyze the top-level MultiTermQuery and ignore it’s children if you find the details too tricky to interpret.

Hopefully this will be fixed in future iterations, but it is a tricky problem to solve and still in-progress :)