Significant Text Aggregation Usage

edit

An aggregation that returns interesting or unusual occurrences of free-text terms in a set. It is like the significant terms aggregation but differs in that:

  • It is specifically designed for use on type text fields
  • It does not require field data or doc-values
  • It re-analyzes text content on-the-fly meaning it can also filter duplicate sections of noisy text that otherwise tend to skew statistics.

Re-analyzing large result sets will require a lot of time and memory. It is recommended that the significant_text aggregation is used as a child of either the sampler or diversified sampler aggregation to limit the analysis to a small selection of top-matching documents e.g. 200. This will typically improve speed, memory use and quality of results.

See the Elasticsearch documentation on significant text aggregation for more detail.

Fluent DSL example

edit
a => a
.SignificantText("significant_descriptions", st => st
    .Field(p => p.Description)
    .FilterDuplicateText()
)

Object Initializer syntax example

edit
new SignificantTextAggregation("significant_descriptions")
{
    Field = Infer.Field<Project>(p => p.Description),
    FilterDuplicateText = true
}

Example json output.

{
  "significant_descriptions": {
    "significant_text": {
      "field": "description",
      "filter_duplicate_text": true
    }
  }
}

Handling Responses

edit
response.ShouldBeValid();
var sigNames = response.Aggregations.SignificantText("significant_descriptions");
sigNames.Should().NotBeNull();
sigNames.DocCount.Should().BeGreaterThan(0);
foreach (var bucket in sigNames.Buckets)
{
    bucket.Key.Should().NotBeNullOrEmpty();
    bucket.BgCount.Should().BeGreaterThan(0);
    bucket.DocCount.Should().BeGreaterThan(0);
    bucket.Score.Should().BeGreaterThan(0);
}