Intervals query
editIntervals query
editReturns documents based on the order and proximity of matching terms.
The intervals
query uses matching rules, constructed from a small set of
definitions. Theses rules are then applied to terms from a specified field
.
The definitions produce sequences of minimal intervals that span terms in a body of text. These intervals can be further combined and filtered by parent sources.
Example request
editThe following intervals
search returns documents containing my
favorite food
immediately followed by hot water
or cold porridge
in the
my_text
field.
This search would match a my_text
value of my favorite food is cold
porridge
but not when it's cold my favorite food is porridge
.
POST _search { "query": { "intervals" : { "my_text" : { "all_of" : { "ordered" : true, "intervals" : [ { "match" : { "query" : "my favorite food", "max_gaps" : 0, "ordered" : true } }, { "any_of" : { "intervals" : [ { "match" : { "query" : "hot water" } }, { "match" : { "query" : "cold porridge" } } ] } } ] } } } } }
Top-level parameters for intervals
editmatch
rule parameters
editThe match
rule matches analyzed text.
-
query
-
(Required, string) Text you wish to find in the provided
<field>
. -
max_gaps
-
(Optional, integer) Maximum number of positions between the matching terms. Terms further apart than this are not considered matches. Defaults to
-1
.If unspecified or set to
-1
, there is no width restriction on the match. If set to0
, the terms must appear next to each other. -
ordered
-
(Optional, boolean)
If
true
, matching terms must appear in their specified order. Defaults tofalse
. -
analyzer
-
(Optional, string) analyzer used to analyze terms in the
query
. Defaults to the top-level<field>
's analyzer. -
filter
- (Optional, interval filter rule object) An optional interval filter.
all_of
rule parameters
editThe all_of
rule returns matches that span a combination of other rules.
-
intervals
- (Required, array of rule objects) An array of rules to combine. All rules must produce a match in a document for the overall source to match.
-
max_gaps
-
(Optional, integer) Maximum number of positions between the matching terms. Intervals produced by the rules further apart than this are not considered matches. Defaults to
-1
.If unspecified or set to
-1
, there is no width restriction on the match. If set to0
, the terms must appear next to each other. -
ordered
-
(Optional, boolean) If
true
, intervals produced by the rules should appear in the order in which they are specified. Defaults tofalse
. -
filter
- (Optional, interval filter rule object) Rule used to filter returned intervals.
any_of
rule parameters
editThe any_of
rule returns intervals produced by any of its sub-rules.
-
intervals
- (Required, array of rule objects) An array of rules to match.
-
filter
- (Optional, interval filter rule object) Rule used to filter returned intervals.
filter
rule parameters
editThe filter
rule returns intervals based on a query. See
Filter example for an example.
-
after
-
(Optional, query object) Query used to return intervals that follow an interval
from the
filter
rule. -
before
-
(Optional, query object) Query used to return intervals that occur before an
interval from the
filter
rule. -
contained_by
-
(Optional, query object) Query used to return intervals contained by an interval
from the
filter
rule. -
containing
-
(Optional, query object) Query used to return intervals that contain an interval
from the
filter
rule. -
not_contained_by
-
(Optional, query object) Query used to return intervals that are not
contained by an interval from the
filter
rule. -
not_containing
-
(Optional, query object) Query used to return intervals that do not contain
an interval from the
filter
rule. -
not_overlapping
-
(Optional, query object) Query used to return intervals that do not overlap
with an interval from the
filter
rule. -
overlapping
-
(Optional, query object) Query used to return intervals that overlap with an
interval from the
filter
rule. -
script
-
(Optional, script object) Script used to return
matching documents. This script must return a boolean value,
true
orfalse
. See Script filters for an example.
Notes
editFilter example
editThe following search includes a filter
rule. It returns documents that have
the words hot
and porridge
within 10 positions of each other, without the
word salty
in between:
POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot porridge", "max_gaps" : 10, "filter" : { "not_containing" : { "match" : { "query" : "salty" } } } } } } } }
Script filters
editYou can use a script to filter intervals based on their start position, end
position, and internal gap count. The following filter
script uses the
interval
variable with the start
, end
, and gaps
methods:
POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot porridge", "filter" : { "script" : { "source" : "interval.start > 10 && interval.end < 20 && interval.gaps == 0" } } } } } } }
Minimization
editThe intervals query always minimizes intervals, to ensure that queries can
run in linear time. This can sometimes cause surprising results, particularly
when using max_gaps
restrictions or filters. For example, take the
following query, searching for salty
contained within the phrase hot
porridge
:
POST _search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "salty", "filter" : { "contained_by" : { "match" : { "query" : "hot porridge" } } } } } } } }
This query does not match a document containing the phrase hot porridge is
salty porridge
, because the intervals returned by the match query for hot
porridge
only cover the initial two terms in this document, and these do not
overlap the intervals covering salty
.
Another restriction to be aware of is the case of any_of
rules that contain
sub-rules which overlap. In particular, if one of the rules is a strict
prefix of the other, then the longer rule can never match, which can
cause surprises when used in combination with max_gaps
. Consider the
following query, searching for the
immediately followed by big
or big bad
,
immediately followed by wolf
:
POST _search { "query": { "intervals" : { "my_text" : { "all_of" : { "intervals" : [ { "match" : { "query" : "the" } }, { "any_of" : { "intervals" : [ { "match" : { "query" : "big" } }, { "match" : { "query" : "big bad" } } ] } }, { "match" : { "query" : "wolf" } } ], "max_gaps" : 0, "ordered" : true } } } } }
Counter-intuitively, this query does not match the document the big bad
wolf
, because the any_of
rule in the middle only produces intervals
for big
- intervals for big bad
being longer than those for big
, while
starting at the same position, and so being minimized away. In these cases,
it’s better to rewrite the query so that all of the options are explicitly
laid out at the top level:
POST _search { "query": { "intervals" : { "my_text" : { "any_of" : { "intervals" : [ { "match" : { "query" : "the big bad wolf", "ordered" : true, "max_gaps" : 0 } }, { "match" : { "query" : "the big wolf", "ordered" : true, "max_gaps" : 0 } } ] } } } } }