WARNING: Version 6.2 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Adjacency Matrix Aggregation
editAdjacency Matrix Aggregation
editA bucket aggregation returning a form of adjacency matrix.
The request provides a collection of named filter expressions, similar to the filters
aggregation
request.
Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
The adjacency_matrix
aggregation is a new feature and we may evolve its design as we get feedback on its use. As a result, the API for this feature may change in non-backwards compatible ways
Given filters named A
, B
and C
the response would return buckets with the following names:
A | B | C | |
---|---|---|---|
A |
A |
A&B |
A&C |
B |
B |
B&C |
|
C |
C |
The intersecting buckets e.g A&C
are labelled using a combination of the two filter names separated by
the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
same set of documents as "A&C". The matrix is said to be symmetric so we only return half of it. To do this we sort
the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
An alternative separator
parameter can be passed in the request if clients wish to use a separator string
other than the default of the ampersand.
Example:
PUT /emails/_doc/_bulk?refresh { "index" : { "_id" : 1 } } { "accounts" : ["hillary", "sidney"]} { "index" : { "_id" : 2 } } { "accounts" : ["hillary", "donald"]} { "index" : { "_id" : 3 } } { "accounts" : ["vladimir", "donald"]} GET emails/_search { "size": 0, "aggs" : { "interactions" : { "adjacency_matrix" : { "filters" : { "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }}, "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }}, "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }} } } } } }
In the above example, we analyse email messages to see which groups of individuals have exchanged messages. We will get counts for each group individually and also a count of messages for pairs of groups that have recorded interactions.
Response:
{ "took": 9, "timed_out": false, "_shards": ..., "hits": ..., "aggregations": { "interactions": { "buckets": [ { "key":"grpA", "doc_count": 2 }, { "key":"grpA&grpB", "doc_count": 1 }, { "key":"grpB", "doc_count": 2 }, { "key":"grpB&grpC", "doc_count": 1 }, { "key":"grpC", "doc_count": 1 } ] } } }
Usage
editOn its own this aggregation can provide all of the data required to create an undirected weighted graph.
However, when used with child aggregations such as a date_histogram
the results can provide the
additional levels of data required to perform dynamic network analysis
where examining interactions over time becomes important.
Limitations
editFor N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
imposed of 100 filters . This setting can be changed using the index.max_adjacency_matrix_filters
index-level setting.