Reducing CVEs in Elastic container images

05-station_(1).jpeg

In this blog post, we will discuss our journey to significantly reduce Common Vulnerabilities and Exposures (CVEs) in Elastic container images by switching to a minimal base image in our Elastic products and optimizing our workflows for a scalable vulnerability management program.

Elastic Stack based on Chainguard images

Chainguard images are a collection of container images that meet the requirements of the secure software supply chain, including verifiable signatures, provenance, software bills of materials (SBOM), few CVEs, and small image sizes. The images are built on top of the Wolfi project, which aims to provide a secure and minimal base image for containerized applications.

Starting with version 8.16, Elastic provides a variant of the Elastic Stack containers based on Chainguard images. The Chainguard variant of Elasticsearch 8.16 was released a few days ago with a lower count of CVEs compared to previous versions, and the in-progress 8.17 development version is already down to only 1 low CVE.

$ snyk container test docker.elastic.co/elasticsearch/elasticsearch-wolfi:8.17.1-SNAPSHOT

Package manager: apk
 Tested 58 dependencies for known issues, no vulnerable paths found.
...
Tested 108 projects, 1 contained vulnerable paths.

Use the following commands to pull the Elastic Stack images based on Wolfi as mentioned on each product documentation page:

docker pull docker.elastic.co/elasticsearch/elasticsearch-wolfi:<VERSION>
docker pull docker.elastic.co/kibana/kibana-wolfi:<VERSION>
docker pull docker.elastic.co/logstash/logstash-wolfi:<VERSION>
docker pull docker.elastic.co/apm/apm-server-wolfi:<VERSION>
docker pull docker.elastic.co/elastic-agent/elastic-agent-wolfi:<VERSION>
docker pull docker.elastic.co/beats/filebeat-wolfi:<VERSION>
docker pull docker.elastic.co/beats/metricbeat-wolfi:<VERSION>

The Wolfi-based images are not the default ones for several reasons: 

  • To avoid breaking customer workloads that rely on Ubuntu packages 

  • To ensure non-Elastic users can keep building default images from the source code

  • To maintain the same user experience when pulling the default Elasticsearch images from Docker Official, Docker Hub, AWS ECR, and the Elastic container registry

Note on the compatibility with Docker versions 20.10.10 or higher 

For users relying on Docker as their container engine, deploying Elastic Stack images based on Wolfi requires Docker version 20.10.10 (which is end of life as of December 10, 2023) or higher. The incompatibility is due to recent images using a version of glibc newer than 2.34. glibc 2.34+ defaults to using a new clone3 syscall. For backward compatibility, glibc attempts to fall back to clone when encountering the ENOSYS error. However, the default seccomp filter in Docker 20.10.9 and lower versions causes an EPERM error, which is treated as a fatal error by glibc and prevents the fallback from occurring. A fix has been backported to Docker version 20.10.10 and above, addressing the compatibility issue. ECE customers running Elastic Stack 8.16+ require a Docker version 20.10.10 or higher.

Approach to addressing vulnerabilities

Engineering and information security teams worked on addressing vulnerability management challenges to achieve multiple goals: to provide hardened containers to our customers; to help with compliance regulations; to improve our supply chain security posture; and to reduce the burden of addressing and triaging CVEs on our customers, engineering, security, and support teams. The impact spans across Elastic products, including Elastic Self-Managed offerings (Elastic Stack), Elastic Cloud on Kubernetes (ECK), and Elastic Cloud (Serverless and Hosted).

At a high level, the first step was to define how teams within the organization would comply with the vulnerability management program and the associated service level objective (SLO) used to measure compliance. Next, we focused on deploying tools and processes to ensure that engineering teams are proactively notified, enabling them to efficiently manage their projects in order to meet these objectives and respond appropriately when these SLOs are breached. This initiative was founded on the following principles:

  • (1) Establish a secure foundation: By building on top of the Chainguard images, we set up a foundation for success to build securely by default across the organization — providing automatic and fast vulnerability remediation without adding burden to our engineers.

  • (2) Optimize for container workload: Every component included in the container image must be required and optimized for the targeted runtime environment. 
  • (3) Continuous code analysis: Software composition analysis (SCA) tooling runs continuously to build a comprehensive inventory of open source third-party components in Elastic products and proactively identify and mitigate issues that may impact our products because of their use.
  • (4) CVE SLO quality gates: Enable enforcement of CVE SLO checks before a container image is released or deployed to production.
  • (5) Continuous monitoring: Teams are automatically notified when their products running in production are not compliant anymore as new vulnerabilities are frequently discovered, including impact container images that were free of vulnerabilities by the time of their deployment to production. 
  • (*) Frequent updates: Critical to the success of this initiative, the efforts in (1) to (5) are useless without deploying frequently. Processes are in place to ensure the events triggered by (1), (3), or (5) lead to notifications for a new deployment.

Establish a secure foundation with automated updates

The workflow that ensures a smooth experience for engineers at Elastic in using secure-based images for their container products and keeping them up to date is built upon the Chainguard images product, the Renovate project, and best practices in supply chain security. 

Elastic uses a mix of Chainguard developer and production images that are regularly synchronized to the Elastic container registry with their signatures and SBOMs. Prior to being synchronized, each image signature is verified using cosign. Storing these images in the Elastic registry provides the optimal developer experience for Elastic engineers, mitigates the risk of incidents arising from third-party systems, and ensures control over the source from which our containers are pulled in production.

We provide documentation to engineers that outlines several key practices. First, it emphasizes the importance of referencing a tag and a digest for each base image used — pinning a container image to a digest ensures maximal build reproducibility, and while image tags are mutable, digests are not. Additionally, engineers are encouraged to use Docker multistage builds by combining a fully featured image at build time with a distroless image at runtime. Distroless images significantly reduce the attack surface of a container by containing only the application and its runtime dependencies, thereby minimizing the risk of vulnerabilities associated with the base image.

Renovate is an open source tool to automate the maintenance of software dependencies. It’s configured to improve developer experience for updating Chainguard images used in the Elastic GitHub repositories by automatically raising pull requests to modify the base images digest as soon as new ones are available. As shown below, Renovate is configured in the Elasticsearch repository to ensure the base image digests get automatically updated on the releasable git branches when Chainguard provides a new version:

ECK 2.16 released with 0 CVE

Built on the Kubernetes Operator pattern, ECK extends the basic Kubernetes orchestration capabilities to support the setup and management of the Elastic Stack. On December 18, 2024, ECK 2.16.0 was released with 0 CVE!

$ snyk container test docker.elastic.co/eck/eck-operator:2.16.0

 Tested 3 dependencies for known issues, no vulnerable paths found.
...
 Tested 707 dependencies for known issues, no vulnerable paths found.

Tested 2 projects, no vulnerable paths were found.

Looking at the ECK repository codebase and especially the Dockerfile, it illustrates the best practices mentioned above:

  • A multistage build phase using the Chainguard Go image to build the binary from the Elastic container registry that is referenced via the tags and digest values to ensure build reproducibility and automated updates:

# Build the operator binary
FROM docker.elastic.co/wolfi/go:1.23.4@sha256:0c563962687ca1d5677b810d2fcb6c1dcb7bd650c822999c715ad715590f14bb AS builder
...
# Build
RUN --mount=type=cache,mode=0755,target=/go/pkg/mod \
      CGO_ENABLED=0 GOOS=linux LICENSE_PUBKEY=/$LICENSE_PUBKEY make go-build
  • A multistage runtime phase using a distroless image to reduce the attack surface that is always referenced by a tag+digest value:
FROM docker.elastic.co/wolfi/static:latest@sha256:5ff428f8a48241b93a4174dbbc135a4ffb2381a9e10bdbbc5b9db145645886d5
...
COPY --from=builder /go/src/github.com/elastic/cloud-on-k8s/elastic-operator /elastic-operator
...
ENTRYPOINT ["/elastic-operator"]
CMD ["manager"]

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.