Writing queries
editWriting queries
editOnce you have data indexed within Elasticsearch, you’re going to want to be able to search it. Elasticsearch offers a powerful query DSL to define queries to execute against Elasticsearch. This DSL is based on JSON and is exposed in NEST in the form of both a Fluent API and an Object Initializer syntax
Match All query
editThe simplest of queries is the match_all query;
this will return all documents, giving them all a _score
of 1.0
Not all of the matching documents are returned in the one response; by default, only the first ten documents
are returned. You can use from
and size
to paginate results.
var searchResponse = _client.Search<Project>(s => s .Query(q => q .MatchAll() ) );
which serializes to the following JSON
{ "query": { "match_all": {} } }
Since match_all
queries are common, the previous example also has a shorthand which
serializes to the same query DSL JSON
searchResponse = _client.Search<Project>(s => s .MatchAll() );
The two previous examples both used the Fluent API to express the query. NEST also exposes an Object Initializer syntax to compose queries
var searchRequest = new SearchRequest<Project> { Query = new MatchAllQuery() }; searchResponse = _client.Search<Project>(searchRequest);
Search request parameters
editThere are several parameters available on a search request; take a look at the reference section on search for more details.
Common queries
editBy default, documents will be returned in _score
descending order, where the _score
for each hit
is the relevancy score calculated for how well the document matched the query criteria.
There are a number of search queries at your disposal, all of which are documented in the Query DSL reference section. Here, we want to highlight the three types of query operations that users typically want to perform
Structured search
editStructured search is about querying data that has inherent structure. Dates, times and numbers are all structured and it is common to want to query against fields of these types to look for exact matches, values that fall within a range, etc. Text can also be structured, for example, the keyword tags applied to a blog post.
With structured search, the answer to a query is always yes or no; a document is either a match for the query or it isn’t.
The term level queries are typically used for structured search. Here’s an example that looks for documents whose started on date falls within a specified range
var searchResponse = _client.Search<Project>(s => s .Query(q => q .DateRange(r => r .Field(f => f.StartedOn) .GreaterThanOrEquals(new DateTime(2017, 01, 01)) .LessThan(new DateTime(2018, 01, 01)) ) ) );
which yields the following query JSON
{ "query": { "range": { "startedOn": { "lt": "2018-01-01T00:00:00", "gte": "2017-01-01T00:00:00" } } } }
Since the answer to this query is always yes or no, we don’t want to score the query. To do this,
we can get the query to be executed in a filter context by wrapping it in a bool
query filter
clause
searchResponse = _client.Search<Project>(s => s .Query(q => q .Bool(b => b .Filter(bf => bf .DateRange(r => r .Field(f => f.StartedOn) .GreaterThanOrEquals(new DateTime(2017, 01, 01)) .LessThan(new DateTime(2018, 01, 01)) ) ) ) ) );
{ "query": { "bool": { "filter": [ { "range": { "startedOn": { "lt": "2018-01-01T00:00:00", "gte": "2017-01-01T00:00:00" } } } ] } } }
The benefit of executing a query in a filter context is that Elasticsearch is able to forgo calculating a relevancy score, as well as cache filters for faster subsequent performance.
term level queries have no analysis phase, that is, the query input is not analyzed, and an exact match to the input is looked for in the inverted index. This can trip many new users up when using a term level query against a field that is analyzed at index time.
When a field is only to be used for exact matching, you should consider indexing it as a keyword datatype. If a field is used for both exact matches and full text search, you should consider indexing it with multi fields.
Unstructured search
editAnother common use case is to search within full text fields in order to find the most relevant documents.
Full text queries are used for unstructured search; here we use the match
query
to find all documents that contain "Russ"
in the lead developer first name field
var searchResponse = _client.Search<Project>(s => s .Query(q => q .Match(m => m .Field(f => f.LeadDeveloper.FirstName) .Query("Russ") ) ) );
which yields the following query JSON
{ "query": { "match": { "leadDeveloper.firstName": { "query": "Russ" } } } }
full text queries have an analysis phase, that is, the query input is analyzed, and the resulting terms from query analysis are compared to the terms in the inverted index.
You have full control over the analysis that is applied at both search time and index time, by applying analyzers to text datatype fields through mapping.
Combining queries
editAn extremely common scenario is to combine separate queries together to form a
compound query, the most common of which is the bool
query
var searchResponse = _client.Search<Project>(s => s .Query(q => q .Bool(b => b .Must(mu => mu .Match(m => m .Field(f => f.LeadDeveloper.FirstName) .Query("Russ") ), mu => mu .Match(m => m .Field(f => f.LeadDeveloper.LastName) .Query("Cam") ) ) .Filter(fi => fi .DateRange(r => r .Field(f => f.StartedOn) .GreaterThanOrEquals(new DateTime(2017, 01, 01)) .LessThan(new DateTime(2018, 01, 01)) ) ) ) ) );
match documents where lead developer first name contains Russ |
|
…and where the lead developer last name contains Cam |
|
…and where the project started in 2017 |
which yields the following query JSON
{ "query": { "bool": { "must": [ { "match": { "leadDeveloper.firstName": { "query": "Russ" } } }, { "match": { "leadDeveloper.lastName": { "query": "Cam" } } } ], "filter": [ { "range": { "startedOn": { "lt": "2018-01-01T00:00:00", "gte": "2017-01-01T00:00:00" } } } ] } } }
A document must satisfy all three queries in this example to be a match
-
the
match
queries on both first name and last name will contribute to the relevancy score calculated, since both queries are running in a query context -
the
range
query against the started on date is running in a filter context, so no score is calculated for matching documents (all documents have the same score of 1.0 for this query).
Because bool
queries are so common, NEST overloads operators on queries to make forming
bool
queries much more succinct. The previous bool
query can be more concisely
expressed as
searchResponse = _client.Search<Project>(s => s .Query(q => q .Match(m => m .Field(f => f.LeadDeveloper.FirstName) .Query("Russ") ) && q .Match(m => m .Field(f => f.LeadDeveloper.LastName) .Query("Cam") ) && +q .DateRange(r => r .Field(f => f.StartedOn) .GreaterThanOrEquals(new DateTime(2017, 01, 01)) .LessThan(new DateTime(2018, 01, 01)) ) ) );
combine queries using the binary |
|
wrap a query in a |
Take a look at the dedicated section on writing bool
queries for more detail
and further examples.
Search response
editThe response returned from a search query is an SearchResponse<T>
, where T
is the
generic parameter type defined in the search method call. There are a fair few properties
on the response, but the most common you’re likely to work with is .Documents
,
which we’ll demonstrate below.
Matching documents
editTo get the documents in the response that match the search query is easy enough
var searchResponse = client.Search<Project>(s => s .Query(q => q .MatchAll() ) ); var projects = searchResponse.Documents;
.Documents
is a convenient shorthand for retrieving the _source
for each hit
var sources = searchResponse.HitsMetadata.Hits.Select(h => h.Source);
and it’s possible to retrieve other metadata about each hit from the hits collection. Here’s an example that retrieves the highlights for a hit, when using highlighting