Pods, Tokens, and a Little Glue: Integrating Kubernetes and Vault in Elastic Infrastructure

On Elastic's Infrastructure team, we are always looking for opportunities to introduce automation and reduce operational burdens. Function-as-a-service solutions such as AWS Lambda have allowed us to simplify operations for some services, but applications that require persistent runtimes have different challenges. There are many potential solutions to this problem: fleets of autoscaled AWS ECS instances, platforms such as Google App Engine or AWS Elastic Beanstalk, and more. Our work on this has led to investigating various container schedulers - and with such a large community and broad feature set, Kubernetes has proven a useful platform for our use case.

Adopting a solution as powerful and flexible as Kubernetes entails some non-trivial work to bring it into an existing operations workflow, not the least of which is providing secrets or otherwise sensitive information to running applications within a Pod or Deployment. This post will explain our approach to integrating Hashicorp Vault (our chosen secret management solution) with Kubernetes.

Introduction

Integrating Kubernetes and Vault should meet a few requirements:

  • Secrets should be securely communicated to running services. No sensitive tokens should be committed into source code repositories.
  • Sensitive data should be centralized in Vault and not duplicated in other datastores to both simplify auditing and reduce potential attack surfaces.

Moreover, applications should be able to migrate to Kubernetes without dramatic changes in order to consume secrets - this is critical to aid in migration from traditional platforms like AWS EC2 to Kubernetes.

Background

Before diving into the implementation, there are a few details specific to our solution that are worth highlighting:

  • We make heavy use of an internally hosted, private Docker registry. Aside from the obvious ability to reference images that should be internal-only, building custom images for many of these services (such as kubernetes-vault) ensures that they are deployed with audited code and that updates are more closely controlled.
  • Familiarity with Vault concepts is expected. In particular, the AppRole backend is heavily used in the following design pattern, which couples a specific role identifier with a securely distributed secret that kubernetes-vault pushes to a Kubernetes initContainer.
  • Many best practices such as audit logging, policy design, and other considerations are not covered in this high-level overview. In a production deployment, these and other decisions should be reviewed, especially when deploying a service that manages sensitive information such as Vault.

Implementation

There are a few steps that come together to achieve the goals outlined in the introduction: first, connecting Kubernetes to Vault; second, providing a mechanism to expose secrets to running applications; and third, creating reusable tools to aid in generic solutions for additional applications.

Connecting Kubernetes and Vault

Fortunately, the kubernetes-vault project provides a well-designed solution to this problem. Response wrapping ensures that tokens are passed to applications securely in-transit, meeting the first requirement for integrating with Vault.

Our deployment of kubernetes-vault looks similar to the one outlined in the project's quick start guide, with the exception that Vault is run outside of Kubernetes as an independent service. This works equally well as running Vault within Kubernetes itself, which is important as a variety of services outside of our Kubernetes cluster also consumes Vault for different use cases.

One important consideration when deploying the kubernetes-vault deployment is to ensure that the token passed to the service is periodic. When interacting with Vault as a limited-privilege user, any generated tokens are subject to expiry when their parent token is revoked or expired.

Exposing Secrets to Applications

Leveraging kubernetes-vault brings us to the point that a JSON-formatted file is available with several values including the Vault token. While consuming this file can be done with a little code for new applications, retrofitting existing applications can be challenging.

Fortunately, many applications can easily consume environment variables, which provides a convenient way to pass secret values without needing to write them out to a persistent file or in a Pod/Deployment spec. Another tool called vaultenv can automate this process.

To illustrate how this works, consider a Deployment for Logstash (which can reference environment variables in its configuration file). How can we pass an HTTP basic authentication username and password to the elasticsearch output plugin to securely index events to an external, secured cluster?

First, we define a ConfigMap defining where to reference secrets from within Vault:

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-secrets
data:
  logstash.secrets: |
    ELASTICSEARCH_USERNAME=elasticsearch/production#username
    ELASTICSEARCH_PASSWORD=elasticsearch/production#password

This instructs vaultenv to expose a secret stored in the generic secret backend under the path secret/elasticsearch/production with the key password as the environment variable ELASTICSEARCH_PASSWORD. This ConfigMap is mounted into the pod at /etc/secrets.d as a de facto directory for vaultenv secrets.

In a similar fashion, we can define our Logstash config as a ConfigMap, interpolating the named variables into the configuration file directly:

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstsah-config
data:
  example.conf: |
    input {
      exec {
        command => "date"
        interval => 5
      }
    }

    output {
        elasticsearch {
            index    => "logstash-%{+YYYY}"
            hosts    => ["https://elasticsearch.cluster.url:9200"]
            user     => "${ELASTICSEARCH_USERNAME}"
            password => "${ELASTICSEARCH_PASSWORD}"
            ssl      => true
        }
    }

Ensuring that the vaultenv executable is present in the container can be done in a Dockerfile with some simple RUN commands or in a multi-stage Dockerfile to compile vaultenv locally and COPY it into the final container image. Additionally, installing jq makes consuming the vault-token file easier.

Finally, the container's command can be defined. In the case of the official Elastic Logstash Docker image, the default ENTRYPOINT is /usr/local/bin/docker-entrypoint, so we simply wrap that script within a vaultenv invocation:

token=$(jq -r '.clientToken' /var/run/secrets/boostport.com/vault-token)
exec vaultenv \
      --host $VAULT_ADDR \
      --token $token \
      --secrets-file /etc/secrets.d/logstash.secrets \
      /usr/local/bin/docker-entrypoint

Coupled with an initContainer as explained in the kubernetes-vault documentation, vaultenv will retrieve the configured secrets and invoke /usr/local/bin/docker-entrypoint with the ELASTICSEARCH_USERNAME and ELASTICSEARCH_PASSWORD environment variables.

This pattern permits the token to be securely passed to vaultenv, which lets us modify the original application container minimally to invoke its same command wrapped in vaultenv's provided environment variables.

Managing Periodic Tokens

Like the token used for kubernetes-vault, defining a TTL for tokens issued to an AppRole is a best practice to ensure that access is revoked regularly for deleted pods and old tokens. While vaultenv only executes once before passing control to its command argument, if the pod's container should die for any reason, the container will be re-run, and the token may have expired in the intervening period. Fortunately, because the token JSON file is managed as an emptyDir, we can consume the token from a sidecar container to renew it out-of-band from the pod's main container.

In our environment, we host a small token renewal image on our Docker registry that has the simple task of reading the token JSON file and renewing it regularly. This permits us to re-use the image in Deployment definitions that require it, without needing to change the application container in any way.

As an example, the python daemon's renewal logic could include something as simple as this (using the hvac library):

while True:
    lookup = vault.lookup_token()['data']
    interval = lookup['creation_ttl'] / 2

    if lookup['ttl'] <= interval:
        print('renewing vault token')
        vault.renew_token()

    sleep(10)

This loop simply renews the Vault token if it has less than half its TTL left, checking every ten seconds (the actual logic we use is slightly different, but this is simpler to illustrate). Connection exceptions to Vault are also important to consider.

Adding this automatic renewal to an existing Deployment becomes as simple as defining an additional container to the Kubernetes yaml configuration:

- name: token-renewer
  image: private.registry.url/token-renewer
  volumeMounts:
    - name: vault-token
      mountPath: /var/run/secrets/boostport.com

If the main container is restarted for any reason, vaultenv will have a still-valid token to request secrets again to invoke the container command with newly requested secrets.

An Example Kubernetes Definition

Yaml configuration files for kubernetes-vault and related specifications are likely deployment-specific, but this example Deployment illustrates what an example application definition might look like. Some specific items of note in this configuration file:

  • VAULT_ROLE_ID and VAULT_ADDR are environment-specific. Note that a Vault AppRole will need to be created for each AppRole that kubernetes-vault is expected to generate secret IDs and tokens for.
  • Our infrastructure team manages an access-controlled Docker registry, which serves as a way to host private images for many of these utilities. We build these images after verifying the code in order to ensure each component is run on a Docker image with known code. In an environment where security is paramount, doing so is important to ensure containers are running trusted code.
  • Though this example explicitly lists the vaultenv container command, additional Dockerfile directives can encapsulate this logic during the container building phase.

With the necessary yaml additions, many Deployment specifications can re-use this pattern to securely reference Vault secrets with limited policies to ensure that the principle of least privilege is followed.

Final Thoughts

This example illustrates how to create applications that are easy to scale and manage thanks to Kubernetes, and can safely consume secrets from a highly secure store such as Vault.

In practice, this has proven to be a useful system design for our use case. In particular, using Vault as a cloud-agnostic secret store has permitted us to run Kubernetes in any arbitrary environment and consume Vault APIs regardless of where kubernetes-vault is running, whether in AWS, GCP, or otherwise. Relying on kubernetes-vault's response wrapping to securely deposit a token in a container alleviates some of the tension regarding how to distribute initial access credentials, and vaultenv has allowed us to provide an application-agnostic way of exposing those fetched secrets to applications that need them.

Note: As of today, Hashicorp has announced native integration with Kubernetes. While the approach laid out in this post obviously predates this technique, some concepts in this post (such as leveraging vaultenv for services that do not natively consume Vault tokens) are still useful to consider.