Fleet and Elastic Agent 8.11.0

edit

Review important information about Fleet Server and Elastic Agent for the 8.11.0 release.

Due to a memory leak issue, Windows users running Elastic Agent are recommended to avoid upgrading to this release and waiting for the upcoming 8.11.2 release in which the issue is resolved. If you’ve already upgraded to 8.11.0 or 8.11.1, we recommend upgrading to 8.11.2 as soon as it becomes available. See the known issue for more detail.

Security updates

edit
Elastic Agent
  • Updated Go version to 1.20.10. #3601

Breaking changes

edit

Breaking changes can prevent your application from optimal operation and performance. Before you upgrade, review the breaking changes, then mitigate the impact to your application.

Compression is enabled by default for Elastic Agent Elasticsearch outputs

Details
The default compression level for Elasticsearch outputs is changing from 0 to 1.

Impact
On typical workloads this is expected to decrease network data volume by 70-80%, while increasing CPU use by 20-25% and ingestion time by 10%. The previous behavior can be restored by adding the setting compression_level: 0 to the agent output configuration.

elastic-agent-autodiscover library has been updated to version 0.6.4, disabling metadata For kubernetes.deployment and kubernetes.cronjob fields.

Details
The elastic-agent-autodiscover Kubernetes library by default comes with add_resource_metadata.deployment=false and add_resource_metadata.cronjob=false.

Impact
Pods that will be created from deployments or cronjobs will not have the extra metadata field for kubernetes.deployment or kubernetes.cronjob, respectively. This change was made to avoid the memory impact of keeping the feature enabled in big Kubernetes clusters. For more information, refer to #3593.

Known issues

edit
Memory leak running Elastic Agent in Windows environments with the System Integration

Details

A memory leak has been identified in Beats on Windows. All Beats running Elastic Stack version 8.11.0 or 8.11.1 are affected. The leak also affects the Elastic Agent System integration which is implemented with Beats. The leak will eventually exhaust all memory on the host system, typically after several days.

Impact

This issue has been fixed in version 8.11.2. For a Windows environment, we strongly recommend upgrading directly to 8.11.2 or any higher release.

If you’re already running Elastic Agent version 8.11.0 or 8.11.1 on Windows and do not want to upgrade, we recommend that you:

  1. Disable the process and process_summary metrics in your System integration.
  2. Disable logs and metrics collection.
  3. Restart Elastic Agent.

Note that disabling these datasets will prevent the collection of process-related metrics.

Another workaround is to downgrade Elastic Agent to a version below 8.11.0. Note that this could result in missing or reindexed logs or metrics as the "state" will not be persisted after Elastic Agent is uninstalled and reinstalled.

For Beats we currently do not have a workaround apart from upgrading to 8.12.2 or a later release.

Current stack version is not in the list of Elastic Agent versions in Kibana Fleet UI

Details

On the Fleet UI in Kibana:

  • When adding a new Elastic Agent, the user interface shows a previous version instead of the current version.
  • When attempting to upgrade, the modal window to pick the version shows an earlier version as the latest version.

Impact

You can use the following steps as a workaround:

When upgrading Elastic Agent currently on versions 8.10.4 or lower (simpler)

  1. Open the Fleet UI. Under the Agents tab select Upgrade agent from the actions menu. The version field in the Upgrade agent UI allows you to enter any version.
  2. Enter 8.11.0 or whichever version you want to upgrade the [agents] to. Do not choose a version above the version of Kibana or Fleet Server that you’re running.

When upgrading Elastic Agent currently on any version (more complex, requires API)

  1. Open Kibana and navigate to Management → Dev Tools.
  2. Choose one of the API requests below and submit it through the console. Each of the requests uses version 8.11.0 as an example, but this can be changed to any available version.

    • To upgrade a single Elastic Agent to any version, run:

      POST kbn:/api/fleet/agents/<Elastic Agent ID>/upgrade
      {"version":"8.11.0"}
    • To upgrade a set of Elastic Agents based on a known set of agent IDs, run:

      POST kbn:/api/fleet/agents/bulk_upgrade
      {
        "version":"8.11.0",
        "agents":["<Elastic Agent ID>","<Another Elastic Agent ID>"],
        "start_time":"2023-11-10T09:41:39.850Z"
      }
    • To upgrade a set of Elastic Agents running a specific policy, and below a specific version (for example, 8.11.0), run:

      POST kbn:/api/fleet/agents/bulk_upgrade
      {
        "agents": "fleet-agents.policy_id:<Elastic Fleet Policy ID> and fleet-agents.agent.version<<VERSION>",
        "version": "8.11.0"
      }
      POST kbn:/api/fleet/agents/bulk_upgrade
      {
        "agents": "fleet-agents.policy_id:uuid1-uuid2-uuid3-uuid4 and fleet-agents.agent.version<8.11.0",
        "version": "8.11.0"
      }

To find the ID for any Elastic Agent, open the Agents tab in Fleet and select View agent from the Actions menu. The agent ID and other details are shown.

To learn more about these requests, refer to the Fleet API documentation.

Integrations Server / APM unable to boot in specific ECE environments

Details
A permissions change in the Elastic Agent Docker container can prevent the Elastic Agent or Integrations Server component from booting up within an ECE deployment. The change affects ECE installations that are deployed with a Linux UID other than 1000.

Impact
ECE users with deployments that include APM or Integrations Server are recommended to wait for the next patch release, which is planned to include a fix for this problem.

New features

edit

The 8.11.0 release Added the following new and notable features.

Fleet
  • Set env variable ELASTIC_NETINFO:false in Kibana (#166156).
  • Added restart upgrade action (#166154).
  • Adds ability to set a proxy for agent binary source (#164168).
  • Adds ability to set a proxy for agent download source (#164078).
Elastic Agent
  • Add support for processors in hints-based Kubernetes autodiscover. #3107 #2959
  • Print out Elastic Agent installation steps to show progress. #3338
  • Add colors to Elastic Agent messages printed by the elastic-agent logs command based on their level. #3345

Enhancements

edit
Fleet
  • Adds sidebar navigation showing headings extracted from the readme (#167216).
Fleet Server
  • Expand APM traces to track coordinator and monitor transactions. Add additonal spans across all API endpoints to better track what the server does. Add spans to bulker interactions that link with the queue flush transaction that the bulk action is executed through. #2929
  • Add endpoint to serve PGP keys that clients can use when validating upgrades in cases where the embedded PGP key in a client is compromised and the client can’t reach the internet. #2977 #2887
  • Add ActionLimit and a Gzip writer pool to handle checkin responses, to help prevent OOM errors when updates are issued to many clients. #2994
  • Send errors in API calls and bulker flushes to APM. fleet-server-pull}3053[#3053]
Elastic Agent
  • Improve Elastic Agent uninstall on Windows by adding delay between retries when file removal is blocked by busy files. #3431 #3221
  • Support the NETINFO variable in Elastic Kubernetes manifests. Setting a new environmental variable ELASTIC_NETINFO=false globally disables the netinfo.enabled parameter of the add_host_metadata processor. This disables the indexing of host.ip and host.mac fields. #3354
  • The Elastic Agent uninstall process now finds and kills any running upgrade Watcher process. Uninstalls initiated within 10 minutes of a previous upgrade now work as expected. #3384 #3371
  • Fix the Kubernetes deploy/kubernetes/creator_k8.sh script to correcly exclude configmaps. #3396
  • Allow fetching the GPG key used for upgrade package signature verification from Fleet Server. This enables upgrades using rotated GPG keys in air gapped environments where Fleet Server is the only reachable URI. #3543 #3264
  • Enable tamper protection feature flag by default for Elastic Agent version 8.11.0. #3478
  • Increase Elastic Agent monitoring metrics interval from 10s to 60s to reduce the default ingestion load and long term storage requirements. #3578

Bug fixes

edit
Fleet
  • Vastly improve performance of Fleet final pipeline’s date formatting logic for event.ingested (#167318).
Fleet Server
  • Fix errors produced by the Fleet Server bulker to be ECS compliant. #3034 #3033
Elastic Agent
  • Enable resilient handling of air gapped PGP checks. Elastic Agent should not fail when remote PGP is specified (or official Elastic fallback PGP is used) and remote is not available. #3427 #3426 #3368
  • Prevent a standalone Elastic Agent from being upgraded if an upgrade is already in progress. #3473 #2706
  • Fix a bug that affected reporting progress of the Elastic Agent artifact download during an upgrade. #3548
  • Upgrade elastic-agent-libs to v0.6.0 to fix the Elastic Agent Windows service becoming unresponsive. Fixes Windows service timeouts during WMI queries and during service shutdown. #3632 #3061
  • Increase wait period between service restarts on failure to 15s on Windows. #3657
  • Prevent multiple attempts by Elastic Agent to stop an already stopped service. #3482