Storage and sizing guide

edit

APM processing and storage costs are largely dominated by transactions, spans, and stack frames.

  • Transactions describe an event captured by an Elastic APM agent instrumenting a service. They are the highest level of work being measuring within a service.
  • Spans belong to transactions. They measure from the start to end of an activity, and contain information about a specific code path that has been executed.
  • Stack frames belong to spans. Stack frames represent a function call on the call stack, and include attributes like function name, file name and path, line number, etc. Stack frames can heavily influence the size of a span.

Typical transactions

edit

Due to the high variability of APM data, it’s difficult to classify a transaction as typical. Regardless, this guide will attempt to classify Transactions as Small, Medium, or Large, and make recommendations based on those classifications.

The size of a transaction depends on the language, agent settings, and what services the agent instruments. For instance, an agent auto-instrumenting a service with a popular tech stack (web framework, database, caching library, etc.) is more likely to generate bigger transactions.

In addition, all agents support manual instrumentation. How little or much you use these APIs will also impact what a typical transaction looks like.

If your sampling rate is very small, transactions will be the dominate storage cost.

Here’s a speculative reference:

Transaction size Number of Spans Number of stack frames

Small

5-10

5-10

Medium

15-20

15-20

Large

30-40

30-40

There will always be transaction outliers with hundreds of spans or stack frames, but those are very rare. Small transactions are the most common.

Typical storage

edit

Consider the following typical storage reference. These numbers do not account for Elasticsearch compression.

  • 1 unsampled transaction is ~1 Kb
  • 1 span with 10 stack frames is ~4 Kb
  • 1 span with 50 stack frames is ~20 Kb
  • 1 transaction with 10 spans, each with 10 stack frames is ~50 Kb
  • 1 transaction with 25 spans, each with 25 spans is 250-300 Kb
  • 100 transactions with 10 spans, each with 10 stack frames, sampled at 90% is 600 Kb

APM data compresses quite well, so the storage cost in Elasticsearch will be considerably less:

  • Indexing 100 unsampled transactions per second for 1 hour results in 360,000 documents. These documents use around 50 Mb of disk space.
  • Indexing 10 transactions per second for 1 hour, each transaction with 10 spans, each span with 10 stack frames, results in 396,000 documents. These documents use around 200 Mb of disk space.
  • Indexing 25 transactions per second for 1 hour, each transaction with 25 spans, each span with 25 stack frames, results in 2,340,000 documents. These documents use around 1.2 Gb of disk space.

These examples were indexing the same data over and over with minimal variation. Because of that, the compression ratios observed of 80-90% are somewhat optimistic.