Observability: other versions:
What is Elastic Observability?
What’s new in 8.13
Get started
- Logs and metrics
- Traces and APM
- Data from Splunk
Observability AI Assistant
Application performance monitoring (APM)
- Self manage APM Server
  - APM Server binary
  - Fleet-managed APM Server
- Data Model
  - Spans
  - Transactions
  - Errors
  - Metrics
  - Metadata
- Features
- How-to guides
- OpenTelemetry integration
- Manage storage
- Configure
- Advanced setup
- Secure communication
  - With APM agents
  - With the Elastic Stack
- Monitor
- API
- Troubleshoot
- Upgrade
- Release notes
- Known issues
Log monitoring
- Stream any log file
- Parse and organize logs
- Filter and aggregate logs
- Stream application logs
- Explore logs
- Logs index template reference
- Troubleshoot logs
Infrastructure monitoring
- View infrastructure metrics by resource type
- Explore infrastructure metrics over time
- Analyze and compare hosts
- Detect metric anomalies
- Configure settings
- Metrics reference
AWS monitoring
- Ingestion options
- Monitor AWS with Elastic Agent
- Monitor AWS with Beats
- Monitor AWS with Amazon Data Firehose
- Monitor AWS with Elastic Serverless Forwarder
Azure monitoring
- Monitor Microsoft Azure with Elastic Agent
- Monitor Microsoft Azure with Beats
- Monitor Microsoft Azure with the Azure Native ISV Service
Synthetic monitoring
- Get started
  - Use Project monitors
  - Use the Synthetics app
- Scripting browser monitors
- Configure lightweight monitors
- Manage monitors
- Work with params and secrets
- Analyze monitor data
- Monitor resources on private networks
- Use the CLI
  - Filtering monitors
- Configure projects
- Configure Synthetics settings
- Grant users access to secured resources
- Manage data retention
- Use Synthetics with traffic filters
- Migrate from the Elastic Synthetics integration
- Scale and architect a deployment
- Synthetics support matrix
- Synthetics Encryption and Security
- Troubleshooting
Uptime monitoring
- Get started with Uptime
- Analyze
- Configure settings
- Troubleshoot mapping issues
Real user monitoring
Universal Profiling
- Get started
- Manage data storage
  - Index lifecycle management
  - Configure probabilistic profiling
- Advanced configuration
- Upgrade
- Troubleshoot
- Self-hosted infrastructure
Alerting
- Create and manage rules
- Aggregation options
  - Rate aggregation
- View alerts
  - SLO burn rate breaches
  - Threshold breaches
Service-level objectives (SLOs)
- Configure SLO access
- Create an SLO
Cases
- Configure access to cases
- Open and manage new cases
- Configure external connectors
CI/CD observability
Troubleshooting
- Explore data
- Inspect
Fields reference
- Logs app fields
- Infrastructure app fields
Tutorials
- Monitor Google Cloud Platform
  - GCP Dataflow templates
- Monitor a Java application
- Monitor nginx
- Monitor Kubernetes

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Upgrade Universal Profiling Run Universal Profiling on self-hosted Elastic stack »

› ›

Troubleshoot your Universal Profiling agent deployment

edit

Troubleshoot your Universal Profiling agent deployment

edit

You can use the host-agent logs to find errors.

The following is an example of a healthy host-agent output:

time="..." level=info msg="Starting Prodfiler Host Agent v2.4.0 (revision develop-5cce978a, build timestamp 12345678910)"
time="..." level=info msg="Interpreter tracers: perl,php,python,hotspot,ruby,v8"
time="..." level=info msg="Automatically determining environment and machine ID ..."
time="..." level=warning msg="Environment tester (gcp) failed: failed to get GCP metadata: Get \"http://169.254.169.254/computeMetadata/v1/instance/id\": dial tcp 169.254.169.254:80: i/o timeout"
time="..." level=warning msg="Environment tester (azure) failed: failed to get azure metadata: Get \"http://169.254.169.254/metadata/instance/compute?api-version=2020-09-01&format=json\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="..." level=warning msg="Environment tester (aws) failed: failed to get aws metadata: EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: RequestError: send request failed\ncaused by: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="..." level=info msg="Environment: hardware, machine ID: 0xdeadbeefdeadbeef"
time="..." level=info msg="Assigned ProjectID: 5"
time="..." level=info msg="Start CPU metrics"
time="..." level=info msg="Start I/O metrics"
time="..." level=info msg="Found tpbase offset: 9320 (via x86_fsbase_write_task)"
time="..." level=info msg="Environment variable KUBERNETES_SERVICE_HOST not set"
time="..." level=info msg="Supports eBPF map batch operations"
time="..." level=info msg="eBPF tracer loaded"
time="..." level=info msg="Attached tracer program"
time="..." level=info msg="Attached sched monitor"

A host-agent deployment is working if the output of the following command is empty:

head host-agent.log -n 15 | grep "level=error"

If running this command outputs error-level logs, the following are possible causes:

The host-agent is running on an unsupported version of the Linux kernel, or its missing kernel features.

If the host-agent is running on an unsupported kernel version, the following is logged:
```
Universal Profiling Agent requires kernel version 4.19 or newer but got 3.10.0
```
If eBPF features are not available in the kernel, the host-agent fails to start, and one of the following is logged:
```
Failed to probe eBPF syscall
```
or
```
Failed to probe tracepoint
```
The host-agent is not able to connect to Elastic Cloud. In this case, a similar message to the following is logged:
```
Failed to setup gRPC connection (retrying...): context deadline exceeded
```
Verify the collection-agent configuration value is set and is equal to what was printed in Kibana, when clicking to Add Data.
The secret token is not valid, or it has been changed. In this case, the host-agent shuts down, and logs a similar message to the following:
```
rpc error: code = Unauthenticated desc = authentication failed
```
The host-agent is unable to send data to your deployment. In this case, a similar message to the following is logged:
```
Failed to report hostinfo (retrying...): rpc error: code = Unimplemented desc = unknown service collectionagent.CollectionAgent"
```
This typically means that your Elastic Cloud cluster has not been configured for Universal Profiling. To configure your Elastic Cloud cluster, follow the steps in configure data ingestion.
The collector (part of the backend in Elastic Cloud that receives data from the host-agent) ran out of memory. In this case, a similar message to the following is logged:
```
Error: failed to invoke XXX(): Unavailable rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "application/json; charset=UTF-8"
```
Verify that the collector is running by navigating to Elastic Cloud → Deployments → <Deployment Name> → Integrations Server in Elastic Cloud. If the Copy endpoint link next to Profiling is grayed out, you need to restart the collector by clicking Force Restart under Integrations Server Management.

For non-demo workloads, verify that the Integrations Server has at least the recommended 4GB of RAM. You can check this on the Integrations Server page under Instances.
The host-agent is incompatible with the Elastic Stack version. In this case, the following message is logged:
```
rpc error: code = FailedPrecondition desc= HostAgent version is unsupported, please upgrade to the latest version
```
Follow the host-agent deployment instructions shown in Kibana which will always be correct for the Elastic Stack version that you are using.
You are using a host-agent from a newer Elastic Stack version, configured to connect to an older Elastic Stack version cluster. In this case, the following message is logged:
```
rpc error: code = FailedPrecondition desc= Backend is incompatible with HostAgent, please check your configuration
```
Follow the host-agent deployment instructions shown in Kibana which will always be correct for the Elastic Stack version that you are using.

If you’re unable to find a solution to the host-agent failure, you can raise a support request indicating Universal Profiling and host-agent as the source of the problem.

Enable verbose logging in host-agent

edit

During the support process, you may be asked to provide debug logs from one of the host-agent installations from your deployment.

To enable debug logs, add the -verbose command-line flag or the verbose true setting in the configuration file.

We recommend only enabling debug logs on a single instance of host-agent rather than an entire deployment to limit the amount of logs produced.

Improve load times

edit

The amount of data loaded for the flamegraph, topN functions, and traces view can lead to latency when using a slow connection (e.g. DSL or mobile).

Setting the Kibana cluster option server.compression.brotli.enabled: true reduces the amount of data transferred and should reduce load time.

Troubleshoot host-agent Kubernetes deployments

edit

When the Helm chart installation finishes, the output has instructions on how to check the host-agent pod status and read logs. The following sections provide potential scenarios when host-agent installation is not healthy.

Taints

edit

Kubernetes clusters often include taints and tolerations in their setup. In these cases, a host-agent installation may show no pods or very few pods running, even for a large cluster.

This is because a taint precludes the execution of pods on a node unless the workload has been tolerated. The Helm chart tolerations key in the values.yaml sets the toleration of taints using the official Kubernetes scheduling API format.

The following examples provide a tolerations config that you can add to the Helm chart values.yaml:

To deploy the host-agent on all nodes with taint workload=python:NoExecute, add the following to the values.yaml:
```
tolerations:
- key: "workload"
  value: "python"
  effect: "NoExecute"
```
To deploy the host-agent on all nodes tainted with key production and effect NoSchedule (no value provided), add the following to the values.yaml:
```
tolerations:
  - key: "production"
    effect: "NoSchedule"
    operator: Exists
```

To deploy the host-agent on all nodes, tolerating all taints, add the following to the values.yaml:

tolerations:
  - effect: NoSchedule
    operator: Exists
  - effect: NoExecute
    operator: Exists

Security policy enforcement

edit

Some Kubernetes clusters are configured with hardened security add-ons to limit the blast radius of exploited application vulnerabilities. Different hardening methodologies can impair host-agent operations and may, for example, result in pods continuously restarting after displaying a CrashLoopBackoff status.

Kubernetes PodSecurityPolicy (deprecated)

edit

This Kubernetes API has been deprecated, but some still use it. A PodSecurityPolicy (PSP) may explicitly prevent the execution of privileged containers across the entire cluster.

Since host-agent needs privileges in most kernels/CRI, you need to build a PSP to allow the host-agent DaemonSet to run.

Kubernetes policy engines

edit

Read more about Kubernetes policy engines in the SIG-Security documentation.

The following tools may prevent the execution of host-agent pods as the Helm chart builds a cluster role and binds it into the host-agent service account (we use it for container metadata):

Open Policy Agent Gatekeeper
Kyverno
Fairwinds Polaris

If you have a policy engine in place, configure it to allow the host-agent execution and RBAC configs.

Network configurations

edit

In some instances, your host-agent pods may be running fine, but they will not connect to the remote data collector gRPC interface and stay in the startup phase, while trying to connect periodically.

The following are potential causes:

Kubernetes NetworkPolicies define connectivity rules that prevent all outgoing traffic unless explicitly allow-listed.
Cloud or datacenter provider network rules are restricting egress traffic to allowed destinations only (ACLs).

OS-level security

edit

These settings are not part of Kubernetes and may have been included in the node setup. They can prevent the host-agent from working properly, as they intercept syscalls from the host-agent to the kernel and modify or block them.

If you have implemented security hardening (some providers listed below), you should know the privileges the host-agent needs.

gVisor on GKE
seccomp filters
AppArmor LSM

Submit a support request

edit

You can submit a support request from the support request page in the Elastic Cloud console.

In the support request, specify if your issue deals with the host-agent or the Kibana app.

Send feedback

edit

If troubleshooting and support are not fixing your issues, or you have any other feedback that you want to share about the product, send the Universal Profiling team an email at profiling-feedback@elastic.co.

« Upgrade Universal Profiling Run Universal Profiling on self-hosted Elastic stack »

On this page

Enable verbose logging in host-agent
Improve load times
Troubleshoot host-agent Kubernetes deployments
Taints
Security policy enforcement
Kubernetes PodSecurityPolicy ()
Kubernetes policy engines
Network configurations
OS-level security
Submit a support request
Send feedback

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Troubleshoot your Universal Profiling agent deployment

Troubleshoot your Universal Profiling agent deployment

Enable verbose logging in host-agent

Improve load times

Troubleshoot host-agent Kubernetes deployments

Taints

Security policy enforcement

Kubernetes PodSecurityPolicy (deprecated)

Kubernetes policy engines

Network configurations

OS-level security

Submit a support request

Send feedback

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards