- Observability: other versions:
- Get started
- What is Elastic Observability?
- What’s new in 8.17
- Quickstart: Monitor hosts with Elastic Agent
- Quickstart: Monitor your Kubernetes cluster with Elastic Agent
- Quickstart: Monitor hosts with OpenTelemetry
- Quickstart: Unified Kubernetes Observability with Elastic Distributions of OpenTelemetry (EDOT)
- Quickstart: Collect data with AWS Firehose
- Add data from Splunk
- Applications and services
- Application performance monitoring (APM)
- Get started
- Learn about data types
- Collect application data
- View and analyze data
- Act on data
- Use APM securely
- Manage storage
- Configure APM Server
- Monitor APM Server
- APM APIs
- Troubleshooting
- Upgrade
- Release notes
- Known issues
- Synthetic monitoring
- Get started
- Scripting browser monitors
- Configure lightweight monitors
- Manage monitors
- Work with params and secrets
- Analyze monitor data
- Monitor resources on private networks
- Use the CLI
- Configure projects
- Multi-factor Authentication
- Configure Synthetics settings
- Grant users access to secured resources
- Manage data retention
- Use Synthetics with traffic filters
- Migrate from the Elastic Synthetics integration
- Scale and architect a deployment
- Synthetics support matrix
- Synthetics Encryption and Security
- Troubleshooting
- Real user monitoring
- Uptime monitoring (deprecated)
- Tutorial: Monitor a Java application
- Application performance monitoring (APM)
- CI/CD
- Cloud
- Infrastructure and hosts
- Logs
- Troubleshooting
- Incident management
- Data set quality
- Observability AI Assistant
- Reference
Tail-based sampling
editTail-based sampling
editTail-based sampling configuration options.
Example config file:
apm-server: host: "localhost:8200" rum: enabled: true output: elasticsearch: hosts: ElasticsearchAddress:9200 max_procs: 4
Configure and customize Fleet-managed APM settings directly in Kibana:
- In Kibana, find Fleet in the main menu or use the global search field.
- Under the Agent policies tab, select the policy you would like to configure.
- Find the Elastic APM integration and select Actions > Edit integration.
- Look for these options under Tail-based sampling.
Top-level tail-based sampling settings
editSee Tail-based sampling to learn more.
Enable tail-based sampling
editSet to true
to enable tail based sampling.
Disabled by default. (bool)
APM Server binary |
|
Fleet-managed |
|
Interval
editSynchronization interval for multiple APM Servers.
Should be in the order of tens of seconds or low minutes.
Default: 1m
(1 minute). (duration)
APM Server binary |
|
Fleet-managed |
|
Policies
editCriteria used to match a root transaction to a sample rate.
Policies map trace events to a sample rate.
Each policy must specify a sample rate.
Trace events are matched to policies in the order specified.
All policy conditions must be true for a trace event to match.
Each policy list should conclude with a policy that only specifies a sample rate.
This final policy is used to catch remaining trace events that don’t match a stricter policy.
([]policy
)
APM Server binary |
|
Fleet-managed |
|
Storage limit
editThe amount of storage space allocated for trace events matching tail sampling policies. Caution: Setting this limit higher than the allowed space may cause APM Server to become unhealthy.
If the configured storage limit is insufficient, it logs "configured storage limit reached". The event will bypass sampling and will always be indexed when storage limit is reached.
Default: 3GB
. (text)
APM Server binary |
|
Fleet-managed |
|
Policy-level tail-based sampling settings
editSee Tail-based sampling to learn more.
sample_rate
editThe sample rate to apply to trace events matching this policy. Required in each policy.
The sample rate must be greater than or equal to 0
and less than or equal to 1
.
For example, a sample_rate
of 0.01
means that 1% of trace events matching the policy will be sampled.
A sample_rate
of 1
means that 100% of trace events matching the policy will be sampled. (int)
trace.name
editThe trace name for events to match a policy.
A match occurs when the configured trace.name
matches the transaction.name
of the root transaction of a trace.
A root transaction is any transaction without a parent.id
. (string)
trace.outcome
editThe trace outcome for events to match a policy.
A match occurs when the configured trace.outcome
matches a trace’s event.outcome
field.
Trace outcome can be success
, failure
, or unknown
. (string)
service.name
editThe service name for events to match a policy. (string)
service.environment
editThe service environment for events to match a policy. (string)
Monitoring tail-based sampling
editAPM Server produces metrics to monitor the performance and estimate the workload being processed by tail-based sampling. In order to use these metrics, you need to [enable monitoring for the APM Server](/solutions/observability/apps/monitor-apm-server.md). The following metrics are produced by the tail-based sampler (note that the metrics might have a different prefix, for example beat.stats
for ECH deployments, based on how the APM Server is running):
apm-server.sampling.tail.dynamic_service_groups
editThis metric tracks the number of dynamic services that the tail-based sampler is tracking per policy. Dynamic services are created for tail-based sampling policies that are defined without a service.name
.
This is a counter metric so, should be visualized with counter_rate
.
apm-server.sampling.tail.events.processed
editThis metric tracks the total number of events (including both transaction and span) processed by the tail-based sampler.
This is a counter metric so, should be visualized with counter_rate
.
apm-server.sampling.tail.events.stored
editThis metric tracks the total number of events stored by the tail-based sampler in the database. Events are stored when the full trace is not yet available to make the sampling decision. This value is directly proportional to the storage required by the tail-based sampler to function.
This is a counter metric so, should be visualized with counter_rate
.
apm-server.sampling.tail.events.dropped
editThis metric tracks the total number of events dropped by the tail-based sampler. Only the events that are actually dropped by the tail-based sampler are reported as dropped. Additionally, any events that were stored by the processor but never indexed will not be counted by this metric.
This is a counter metric so, should be visualized with counter_rate
.
apm-server.sampling.tail.storage.lsm_size
editThis metric tracks the storage size of the log-structured merge trees used by the tail-based sampling database in bytes. This metric is one part of the total disk space used by the tail-based sampler. See Total storage size for details on how to monitor total disk size used by the tail-based sampler.
apm-server.sampling.tail.storage.value_log_size
editThis metric tracks the storage size for value log files used by the tail-based sampling database in bytes. This metric is one part of the total disk space used by the tail-based sampler. See Total storage size for details on how to monitor total disk size used by the tail-based sampler.
Total storage size
editTotal storage size is the sum of the apm-server.sampling.tail.storage.lsm_size
and apm-server.sampling.tail.storage.value_log_size
. It is the most crucial metric to track storage requirements for tail-based sampler, especially for big deployments with large distributed traces. Deployments using tail-based sampling extensively should set up alerts and monitoring on this metric.
This metric can also be used to get an estimate of the storage requirements for tail-based sampler before increasing load by extrapolating the metric based on the current usage. It is important to note that before doing any estimation the tail-based sampler should be allowed to run for at least a few TTL cycles and that the estimate will only be useful for similar load patterns.
On this page
- Top-level tail-based sampling settings
- Enable tail-based sampling
- Interval
- Policies
- Storage limit
- Policy-level tail-based sampling settings
sample_rate
trace.name
trace.outcome
service.name
service.environment
- Monitoring tail-based sampling
apm-server.sampling.tail.dynamic_service_groups
apm-server.sampling.tail.events.processed
apm-server.sampling.tail.events.stored
apm-server.sampling.tail.events.dropped
apm-server.sampling.tail.storage.lsm_size
apm-server.sampling.tail.storage.value_log_size
- Total storage size