Writing queries

edit

Once you have data indexed within Elasticsearch, you’re going to want to be able to search it. Elasticsearch offers a powerful query DSL to define queries to execute against Elasticsearch. This DSL is based on JSON and is exposed in NEST in the form of both a Fluent API and an Object Initializer syntax

Match All query

edit

The simplest of queries is the match_all query; this will return all documents, giving them all a _score of 1.0

Not all of the matching documents are returned in the one response; by default, only the first ten documents are returned. You can use from and size to paginate results.

var searchResponse = _client.Search<Project>(s => s
    .Query(q => q
        .MatchAll()
    )
);

which serializes to the following JSON

{
  "query": {
    "match_all": {}
  }
}

Since match_all queries are common, the previous example also has a shorthand which serializes to the same query DSL JSON

searchResponse = _client.Search<Project>(s => s
    .MatchAll()
);

The two previous examples both used the Fluent API to express the query. NEST also exposes an Object Initializer syntax to compose queries

var searchRequest = new SearchRequest<Project>
{
    Query = new MatchAllQuery()
};

searchResponse = _client.Search<Project>(searchRequest);

Search request parameters

edit

There are several parameters available on a search request; take a look at the reference section on search for more details.

Common queries

edit

By default, documents will be returned in _score descending order, where the _score for each hit is the relevancy score calculated for how well the document matched the query criteria.

There are a number of search queries at your disposal, all of which are documented in the Query DSL reference section. Here, we want to highlight the three types of query operations that users typically want to perform

Structured search

edit

Structured search is about querying data that has inherent structure. Dates, times and numbers are all structured and it is common to want to query against fields of these types to look for exact matches, values that fall within a range, etc. Text can also be structured, for example, the keyword tags applied to a blog post.

With structured search, the answer to a query is always yes or no; a document is either a match for the query or it isn’t.

The term level queries are typically used for structured search. Here’s an example that looks for documents whose started on date falls within a specified range

var searchResponse = _client.Search<Project>(s => s
    .Query(q => q
        .DateRange(r => r
            .Field(f => f.StartedOn)
            .GreaterThanOrEquals(new DateTime(2017, 01, 01))
            .LessThan(new DateTime(2018, 01, 01))
        )
    )
);

Find all the projects that have been started in 2017

which yields the following query JSON

{
  "query": {
    "range": {
      "startedOn": {
        "lt": "2018-01-01T00:00:00",
        "gte": "2017-01-01T00:00:00"
      }
    }
  }
}

Since the answer to this query is always yes or no, we don’t want to score the query. To do this, we can get the query to be executed in a filter context by wrapping it in a bool query filter clause

searchResponse = _client.Search<Project>(s => s
   .Query(q => q
       .Bool(b => b
           .Filter(bf => bf
               .DateRange(r => r
                   .Field(f => f.StartedOn)
                   .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                   .LessThan(new DateTime(2018, 01, 01))
               )
           )
       )

   )
);
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "startedOn": {
              "lt": "2018-01-01T00:00:00",
              "gte": "2017-01-01T00:00:00"
            }
          }
        }
      ]
    }
  }
}

The benefit of executing a query in a filter context is that Elasticsearch is able to forgo calculating a relevancy score, as well as cache filters for faster subsequent performance.

term level queries have no analysis phase, that is, the query input is not analyzed, and an exact match to the input is looked for in the inverted index. This can trip many new users up when using a term level query against a field that is analyzed at index time.

When a field is only to be used for exact matching, you should consider indexing it as a keyword datatype. If a field is used for both exact matches and full text search, you should consider indexing it with multi fields.

Unstructured search

edit

Another common use case is to search within full text fields in order to find the most relevant documents.

Full text queries are used for unstructured search; here we use the match query to find all documents that contain "Russ" in the lead developer first name field

var searchResponse = _client.Search<Project>(s => s
    .Query(q => q
        .Match(m => m
            .Field(f => f.LeadDeveloper.FirstName)
            .Query("Russ")
        )
    )
);

which yields the following query JSON

{
  "query": {
    "match": {
      "leadDeveloper.firstName": {
        "query": "Russ"
      }
    }
  }
}

full text queries have an analysis phase, that is, the query input is analyzed, and the resulting terms from query analysis are compared to the terms in the inverted index.

You have full control over the analysis that is applied at both search time and index time, by applying analyzers to text datatype fields through mapping.

Combining queries

edit

An extremely common scenario is to combine separate queries together to form a compound query, the most common of which is the bool query

var searchResponse = _client.Search<Project>(s => s
    .Query(q => q
        .Bool(b => b
            .Must(mu => mu
                .Match(m => m 
                    .Field(f => f.LeadDeveloper.FirstName)
                    .Query("Russ")
                ), mu => mu
                .Match(m => m 
                    .Field(f => f.LeadDeveloper.LastName)
                    .Query("Cam")
                )
            )
            .Filter(fi => fi
                 .DateRange(r => r
                    .Field(f => f.StartedOn)
                    .GreaterThanOrEquals(new DateTime(2017, 01, 01))
                    .LessThan(new DateTime(2018, 01, 01)) 
                )
            )
        )
    )
);

match documents where lead developer first name contains Russ

…​and where the lead developer last name contains Cam

…​and where the project started in 2017

which yields the following query JSON

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "leadDeveloper.firstName": {
              "query": "Russ"
            }
          }
        },
        {
          "match": {
            "leadDeveloper.lastName": {
              "query": "Cam"
            }
          }
        }
      ],
      "filter": [
        {
          "range": {
            "startedOn": {
              "lt": "2018-01-01T00:00:00",
              "gte": "2017-01-01T00:00:00"
            }
          }
        }
      ]
    }
  }
}

A document must satisfy all three queries in this example to be a match

  1. the match queries on both first name and last name will contribute to the relevancy score calculated, since both queries are running in a query context
  2. the range query against the started on date is running in a filter context, so no score is calculated for matching documents (all documents have the same score of 1.0 for this query).

Because bool queries are so common, NEST overloads operators on queries to make forming bool queries much more succinct. The previous bool query can be more concisely expressed as

searchResponse = _client.Search<Project>(s => s
    .Query(q => q
        .Match(m => m
            .Field(f => f.LeadDeveloper.FirstName)
            .Query("Russ")
        ) && q 
        .Match(m => m
            .Field(f => f.LeadDeveloper.LastName)
            .Query("Cam")
        ) && +q 
        .DateRange(r => r
            .Field(f => f.StartedOn)
            .GreaterThanOrEquals(new DateTime(2017, 01, 01))
            .LessThan(new DateTime(2018, 01, 01))
        )
    )
);

combine queries using the binary && operator

wrap a query in a bool query filter clause using the unary + operator and combine using the binary && operator

Take a look at the dedicated section on writing bool queries for more detail and further examples.

Search response

edit

The response returned from a search query is an SearchResponse<T>, where T is the generic parameter type defined in the search method call. There are a fair few properties on the response, but the most common you’re likely to work with is .Documents, which we’ll demonstrate below.

Matching documents

edit

To get the documents in the response that match the search query is easy enough

var searchResponse = client.Search<Project>(s => s
    .Query(q => q
        .MatchAll()
    )
);

var projects = searchResponse.Documents;

.Documents is a convenient shorthand for retrieving the _source for each hit

var sources = searchResponse.HitsMetadata.Hits.Select(h => h.Source);

and it’s possible to retrieve other metadata about each hit from the hits collection. Here’s an example that retrieves the highlights for a hit, when using highlighting

var highlights = searchResponse.HitsMetadata.Hits.Select(h => h
    .Highlight 
);

Get the highlights for the hit, when using highlighting