Writing aggregations

edit

NEST allows you to write your aggregations using

  • a strict fluent DSL
  • a verbatim object initializer syntax that maps verbatim to the Elasticsearch API
  • a more terse object initializer aggregation DSL

Three different ways, yikes that’s a lot to take in! Let’s go over them one at a time and explain when you might want to use each.

This is the json output for each example

{
  "aggs": {
    "name_of_child_agg": {
      "children": {
        "type": "commits"
      },
      "aggs": {
        "average_per_child": {
          "avg": {
            "field": "confidenceFactor"
          }
        },
        "max_per_child": {
          "max": {
            "field": "confidenceFactor"
          }
        },
        "min_per_child": {
          "min": {
            "field": "confidenceFactor"
          }
        }
      }
    }
  }
}

Fluent DSL

edit

The fluent lambda syntax is the most terse way to write aggregations. It benefits from types that are carried over to sub aggregations

Fluent DSL example

edit
s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => childAggs
            .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        )
    )
)

Object Initializer syntax

edit

The object initializer syntax (OIS) is a one-to-one mapping with how aggregations have to be represented in the Elasticsearch API. While it has the benefit of being a one-to-one mapping, being dictionary based in C# means it can gow verbose rather quickly.

Here’s the same aggregations as expressed in the Fluent API above, with the dictionary-based object initializer syntax

Object Initializer syntax example

edit
new SearchRequest<Project>
{
    Aggregations = new AggregationDictionary
    {
        {
            "name_of_child_agg", new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
            {
                Aggregations = new AggregationDictionary
                {
                    {"average_per_child", new AverageAggregation("average_per_child", "confidenceFactor")},
                    {"max_per_child", new MaxAggregation("max_per_child", "confidenceFactor")},
                    {"min_per_child", new MinAggregation("min_per_child", "confidenceFactor")},
                }
            }
        }
    }
}

This starts to get hard to read, wouldn’t you agree? There is a better way however…​

Terse Object Initializer syntax

edit

The Object Initializer syntax can be shortened dramatically by using *Aggregation types directly, allowing you to forego the need to introduce intermediary dictionaries to represent the aggregation DSL. In using these, it is also possible to combine multiple aggregations using the bitwise && operator.

Compare the following example with the previous vanilla Object Initializer syntax

Object Initializer syntax example

edit
new SearchRequest<Project>
{
    Aggregations = new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
    {
        Aggregations =
            new AverageAggregation("average_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MaxAggregation("max_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
            && new MinAggregation("min_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
    }
}

Now that’s much cleaner! Assigning an *Aggregation type directly to the Aggregation property on a search request works because there are implicit conversions within NEST to handle this for you.

Mixed usage of object initializer and fluent

edit

Sometimes its useful to mix and match fluent and object initializer, the fluent Aggregations method therefore also accepts AggregationDictionary directly.

Fluent DSL example

edit
s => s
.Aggregations(new ChildrenAggregation("name_of_child_agg", typeof(CommitActivity))
{
    Aggregations =
        new AverageAggregation("average_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
        && new MaxAggregation("max_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
        && new MinAggregation("min_per_child", Field<CommitActivity>(p => p.ConfidenceFactor))
})

Binary operators off the same descriptor

edit

For dynamic aggregation building using the fluent syntax it can be useful to abstract to methods as much as possible. You can use the binary operator && on the same descriptor to compose the graph. Each side of the binary operation can return null dynamically.

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(Combine)
    )
)

Returning a different AggregationContainer in fluent syntax

edit

All the fluent selector expects is an IAggregationContainer to be returned. You could abstract this to a method returning AggregationContainer which is free to use the object initializer syntax to compose that AggregationContainer.

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => Combine())
    )
)

Aggregating over a collection of aggregations

edit

An advanced scenario may involve an existing collection of aggregation functions that should be set as aggregations on the request. Using LINQ’s .Aggregate() method, each function can be applied to the aggregation descriptor (childAggs below) in turn, returning the descriptor after each function application.

var aggregations =
        new List<Func<AggregationContainerDescriptor<CommitActivity>, IAggregationContainer>> 
        {
            a => a.Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
            a => a.Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor)),
            a => a.Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        };

return s => s
        .Aggregations(aggs => aggs
            .Children<CommitActivity>("name_of_child_agg", child => child
                .Aggregations(childAggs =>
                        aggregations.Aggregate(childAggs, (acc, agg) =>
                        {
                            agg(acc);
                            return acc;
                        }) 
                )
            )
        );

a list of aggregation functions to apply

Using LINQ’s Aggregate() function to accumulate/apply all of the aggregation functions

Aggs vs. Aggregations

edit

The response exposes both .Aggregations and .Aggs properties for handling aggregations. Why two properties you ask? Well, the former is a dictionary of aggregation names to IAggregate types, a common interface for aggregation responses (termed Aggregates in NEST), and the latter is a convenience helper to get the right type of aggregation response out of the dictionary based on a key name.

This is better illustrated with an example. Let’s imagine we make the following request

s => s
.Aggregations(aggs => aggs
    .Children<CommitActivity>("name_of_child_agg", child => child
        .Aggregations(childAggs => childAggs
            .Average("average_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Max("max_per_child", avg => avg.Field(p => p.ConfidenceFactor))
            .Min("min_per_child", avg => avg.Field(p => p.ConfidenceFactor))
        )
    )
)

Now, using .Aggs, we can easily get the Children aggregation response out and from that, the Average and Max sub aggregations.

Handling Responses

edit
response.ShouldBeValid();

var childAggregation = response.Aggs.Children("name_of_child_agg");

var averagePerChild = childAggregation.Average("average_per_child");

averagePerChild.Should().NotBeNull(); 

var maxPerChild = childAggregation.Max("max_per_child");

maxPerChild.Should().NotBeNull(); 

Do something with the average per child. Here we just assert it’s not null

Do something with the max per child. Here we just assert it’s not null