Selecting fields to return

edit

Sometimes you don’t need to return all of the fields of a document from a search query; for example, when showing most recent posts on a blog, you may only need the title of the blog to be returned from the query that finds the most recent posts.

There are two approaches that you can take to return only some of the fields from a document i.e. a partial document (we use this term loosely here); using stored fields and source filtering. Both are quite different in how they work.

Stored fields

edit

When indexing a document, by default, Elasticsearch stores the originally sent JSON document in a special field called _source. Documents returned from a search query are materialized from the _source field returned from Elasticsearch for each hit.

It is also possible to store a field from the JSON document separately within Elasticsearch by using store on the mapping. Why would you ever want to do this? Well, you may disable _source so that the source is not stored and select to store only specific fields. Another possibility is that the _source contains a field with large values, for example, the body of a blog post, but typically only another field is needed, for example, the title of the blog post. In this case, we don’t want to pay the cost of Elasticsearch deserializing the entire _soure just to get a small field.

Opting to disable source for a type mapping means that the original JSON document sent to Elasticsearch is not stored and hence can never be retrieved. Whilst you may save disk space in doing so, certain features are not going to work when source is disabled such as the Reindex API or on the fly highlighting.

Seriously consider whether disabling source is what you really want to do for your use case.

When storing fields in this manner, the individual field values to return can be specified using .StoredFields on the search request

var searchResponse = _client.Search<Project>(s => s
    .StoredFields(sf => sf
        .Fields(
            f => f.Name,
            f => f.StartedOn,
            f => f.Branches
        )
    )
    .Query(q => q
        .MatchAll()
    )
);

And retrieving them is possible using .Fields on the response

foreach (var fieldValues in searchResponse.Fields)
{
    var document = new 
    {
        Name = fieldValues.ValueOf<Project, string>(p => p.Name),
        StartedOn = fieldValues.Value<DateTime>(Infer.Field<Project>(p => p.StartedOn)),
        Branches = fieldValues.Values<Project, string>(p => p.Branches.First())
    };
}

Construct a partial document as an anonymous type from the stored fields requested

This works when storing fields separately. A much more common scenario however is to return only a selection of fields from the _source; this is where source filtering comes in.

Source filtering

edit

Only some of the fields of a document can be returned from a search query using source filtering

var searchResponse = _client.Search<Project>(s => s
    .Source(sf => sf
        .Includes(i => i 
            .Fields(
                f => f.Name,
                f => f.StartedOn,
                f => f.Branches
            )
        )
        .Excludes(e => e 
            .Fields("num*") 
        )
    )
    .Query(q => q
        .MatchAll()
    )
);

Include the following fields

Exclude the following fields

Fields can be included or excluded through wildcard patterns

With source filtering specified on the request, .Documents will now contain partial documents, materialized from the source fields specified to include

var partialProjects = searchResponse.Documents;

It’s possible to exclude _source from being returned altogether from a query with

searchResponse = _client.Search<Project>(s => s
    .Source(false)
    .Query(q => q
        .MatchAll()
    )
);