Writing aggregations

edit

NEST allows you to write your aggregations using

  • a strict fluent DSL
  • a verbatim object initializer syntax that maps verbatim to the Elasticsearch API
  • a more terse object initializer aggregation DSL

Three different ways, yikes that’s a lot to take in! Let’s go over them one at a time and explain when you might want to use each.

This is the json output for each example

{
  "aggs": {
    "name_of_child_agg": {
      "children": {
        "type": "commits"
      },
      "aggs": {
        "average_per_child": {
          "avg": {
            "field": "confidenceFactor"
          }
        },
        "max_per_child": {
          "max": {
            "field": "confidenceFactor"
          }
        },
        "min_per_child": {
          "min": {
            "field": "confidenceFactor"
          }
        }
      }
    }
  }
}

Fluent DSL

edit

The fluent lambda syntax is the most terse way to write aggregations. It benefits from types that are carried over to sub aggregations

Fluent DSL example

edit
s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => childAggs
            .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        )
    )
)

Object Initializer syntax

edit

The object initializer syntax (OIS) is a one-to-one mapping with how aggregations have to be represented in the Elasticsearch API. While it has the benefit of being a one-to-one mapping, being dictionary based in C# means it can grow verbose rather quickly.

Here are the same aggregations as expressed in the Fluent API above, with the dictionary-based object initializer syntax

Object Initializer syntax example

edit
new SearchRequest<Project>
{
    Aggregations = new AggregationDictionary
    {
        {
            "name_of_child_agg", new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
            {
                Aggregations = new AggregationDictionary
                {
                    {"average_per_child", new AverageAggregation("average_per_child", "confidenceFactor")},
                    {"max_per_child", new MaxAggregation("max_per_child", "confidenceFactor")},
                    {"min_per_child", new MinAggregation("min_per_child", "confidenceFactor")},
                }
            }
        }
    }
}

As you can see, the key in the dictionary is repeated as the name passed to the aggregation constructor. This starts to get hard to read and a little error prone, wouldn’t you agree?

There is a better way however…​

Terse Object Initializer syntax

edit

The Object Initializer syntax can be shortened dramatically by using *Aggregation types directly, allowing you to forego the need to introduce intermediary dictionaries to represent the aggregation DSL. In using these, it is also possible to combine multiple aggregations using the bitwise && operator.

Compare the following example with the previous vanilla Object Initializer syntax

Object Initializer syntax example

edit
new SearchRequest<Project>
{
    Aggregations = new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
    {
        Aggregations =
            new AverageAggregation("average_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MaxAggregation("max_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MinAggregation("min_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
    }
}

Now that’s much cleaner! Assigning an *Aggregation type directly to the Aggregation property on a search request works because there are implicit conversions within NEST to handle this for you.

Mixed usage of object initializer and fluent

edit

Sometimes its useful to mix and match fluent and object initializer, the fluent Aggregations method therefore also accepts AggregationDictionary directly.

Fluent DSL example

edit
s => s
.Aggregations(new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
{
    Aggregations =
        new AverageAggregation("average_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
        && new MaxAggregation("max_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
        && new MinAggregation("min_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
})

Binary operators off the same descriptor

edit

For dynamic aggregation building using the fluent syntax, it can be useful to abstract the construction to methods as much as possible. You can use the binary operator && on the same aggregation descriptor to compose the graph. Each side of the binary operation can return null dynamically.

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(Combine)
    )
)

Returning a different AggregationContainer in fluent syntax

edit

All the fluent selector expects is an IAggregationContainer to be returned. You could abstract this to a method returning AggregationContainer which is free to use the object initializer syntax to compose that AggregationContainer.

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => Combine())
    )
)

Aggregating over a collection of aggregations

edit

An advanced scenario may involve an existing collection of aggregation functions that should be set as aggregations on the request. Using LINQ’s .Aggregate() method, each function can be applied to the aggregation descriptor (childAggs below) in turn, returning the descriptor after each function application.

var aggregations =
        new List<Func<AggregationContainerDescriptor<CommitActivity>, IAggregationContainer>> 
        {
            a => a.Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
            a => a.Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
            a => a.Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        };

return s => s
        .Aggregations(aggs => aggs
            .Children<CommitActivity>("name_of_child_agg", child => child
                .Aggregations(childAggs =>
                        aggregations.Aggregate(childAggs, (acc, agg) =>
                        {
                            agg(acc);
                            return acc;
                        }) 
                )
            )
        );

a list of aggregation functions to apply

Using LINQ’s Aggregate() function to accumulate/apply all of the aggregation functions

Handling responses

edit

The SearchResponse exposes an AggregateDictionary which is specialized dictionary over <string, IAggregate> that also exposes handy helper methods that automatically cast IAggregate to the expected aggregate response.

Let’s see this in action:

a => a
.Children<CommitActivity>("name_of_child_agg", child => child
    .Aggregations(childAggs => childAggs
        .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
    )
)

Now, using .Aggregations, we can easily get the Children aggregation response out and from that, the Average and Max sub aggregations.

Handling Responses

edit
response.ShouldBeValid();

var childAggregation = response.Aggregations.Children("name_of_child_agg");

var averagePerChild = childAggregation.Average("average_per_child");

averagePerChild.Should().NotBeNull(); 

var maxPerChild = childAggregation.Max("max_per_child");

maxPerChild.Should().NotBeNull(); 

Do something with the average per child. Here we just assert it’s not null

Do something with the max per child. Here we just assert it’s not null