Run Elastic Agent Standalone on Kubernetes

edit

Running Elastic Agent in standalone mode is an advanced use case. The documentation is incomplete and not yet mature. When possible, we recommend using Fleet-managed agents instead of standalone mode.

What you need

edit
Step 1: Download the Elastic Agent manifest
edit

You can find Elastic Agent Docker images here.

Download the manifest file:

curl -L -O https://raw.githubusercontent.com/elastic/elastic-agent/8.9/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

You might need to adjust resource limits of the Elastic Agent container in the manifest. Container resource usage depends on the number of data streams and the environment size.

This manifest includes the Kubernetes integration to collect Kubernetes metrics and System integration to collect system level metrics and logs from nodes.

The Elastic Agent is deployed as a DaemonSet to ensure that there is a running instance on each node of the cluster. These instances are used to retrieve most metrics from the host, such as system metrics, Docker stats, and metrics from all the services running on top of Kubernetes. These metrics are accessed through the deployed kube-state-metrics. Notice that everything is deployed under the kube-system namespace by default. To change the namespace, modify the manifest file.

Moreover, one of the Pods in the DaemonSet will constantly hold a leader lock which makes it responsible for handling cluster-wide monitoring. You can find more information about leader election configuration options at leader election provider. The leader pod will retrieve metrics that are unique for the whole cluster, such as Kubernetes events or kube-state-metrics. We make sure that these metrics are retrieved from the leader pod by applying the following condition in the manifest, before declaring the data streams with these metricsets:

...
inputs:
  - id: kubernetes-cluster-metrics
    condition: ${kubernetes_leaderelection.leader} == true
    type: kubernetes/metrics
    # metricsets with the state_ prefix and the metricset event
...

For Kubernetes Security Posture Management (KSPM) purposes, the Elastic Agent requires read access to various types of Kubernetes resources, node processes, and files. To achieve this, read permissions are granted to the Elastic Agent for the necessary resources, and volumes from the hosting node’s file system are mounted to allow accessibility to the Elastic Agent pods.

The size and the number of nodes in a Kubernetes cluster can be large at times, and in such a case the Pod that will be collecting cluster level metrics might require more runtime resources than you would like to dedicate to all of the pods in the DaemonSet. The leader which is collecting the cluster wide metrics may face performance issues due to resource limitations if under-resourced. In this case users might consider avoiding the use of a single DaemonSet with the leader election strategy and instead run a dedicated standalone Elastic Agent instance for collecting cluster wide metrics using a Deployment in addition to the DaemonSet to collect metrics for each node. Then both the Deployment and the DaemonSet can be resourced independently and appropriately. For more information check the Scaling Elastic Agent on Kubernetes page.

Step 2: Connect to the Elastic Stack
edit

Set the Elasticsearch settings before deploying the manifest:

- name: ES_USERNAME
  value: "elastic" 
- name: ES_PASSWORD
  value: "passpassMyStr0ngP@ss" 
- name: ES_HOST
  value: "https://somesuperhostiduuid.europe-west1.gcp.cloud.es.io:9243" 

The basic authentication username used to connect to Elasticsearch.

The basic authentication password used to connect to Kibana.

The Elasticsearch host to communicate with.

Refer to Environment variables for all available options.

Step 3: Configure tolerations
edit

Kubernetes control plane nodes can use taints to limit the workloads that can run on them. The manifest for standalone Elastic Agent defines tolerations to run on these. Agents running on control plane nodes collect metrics from the control plane components (scheduler, controller manager) of Kubernetes. To disable Elastic Agent from running on control plane nodes, remove the following part of the DaemonSet spec:

spec:
  # Tolerations are needed to run Elastic Agent on Kubernetes control-plane nodes.
  # Agents running on control-plane nodes collect metrics from the control plane components (scheduler, controller manager) of Kubernetes
  tolerations:
    - key: node-role.kubernetes.io/control-plane
      effect: NoSchedule
    - key: node-role.kubernetes.io/master
      effect: NoSchedule

Both these two tolerations do the same, but node-role.kubernetes.io/master is deprecated as of Kubernetes version v1.25.

Step 4: Deploy the Elastic Agent
edit

To deploy Elastic Agent to Kubernetes, run:

kubectl create -f elastic-agent-standalone-kubernetes.yaml

To check the status, run:

$ kubectl -n kube-system get pods -l app=elastic-agent
NAME                            READY   STATUS    RESTARTS   AGE
elastic-agent-4665d             1/1     Running   0          81m
elastic-agent-9f466c4b5-l8cm8   1/1     Running   0          81m
elastic-agent-fj2z9             1/1     Running   0          81m
elastic-agent-hs4pb             1/1     Running   0          81m
Step 5: View your data in Kibana
edit
  1. Launch Kibana:

    1. Log in to your Elastic Cloud account.
    2. Navigate to the Kibana endpoint in your deployment.
  2. You can see data flowing in by going to Analytics → Discover and selecting the index metrics-*, or even more specific, metrics-kubernetes.*. If you can’t see these indexes, create a data view for them.
  3. You can see predefined dashboards by selecting Analytics→Dashboard, or by installing assets through an integration.

Red Hat OpenShift configuration

edit

If you are using Red Hat OpenShift, you need to specify additional settings in the manifest file and enable the container to run as privileged.

  1. In the manifest file, modify the agent-node-datastreams ConfigMap and adjust inputs:

    • kubernetes-cluster-metrics input:

      • If https is used to access kube-state-metrics, add the following settings to all kubernetes.state_* datasets:

          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          ssl.certificate_authorities:
            - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    • kubernetes-node-metrics input:

      • Change the kubernetes.controllermanager data stream condition to:

        condition: ${kubernetes.labels.app} == 'kube-controller-manager'
      • Change the kubernetes.scheduler data stream condition to:

        condition: ${kubernetes.labels.app} == 'openshift-kube-scheduler'
      • The kubernetes.proxy data stream configuration should look like:

        - data_stream:
            dataset: kubernetes.proxy
            type: metrics
          metricsets:
            - proxy
          hosts:
            - 'localhost:29101'
          period: 10s
      • Add the following settings to all data streams that connect to https://${env.NODE_NAME}:10250:

          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          ssl.certificate_authorities:
            - /path/to/ca-bundle.crt

        ca-bundle.crt can be any CA bundle that contains the issuer of the certificate used in the Kubelet API. According to each specific installation of OpenShift this can be found either in secrets or in configmaps. In some installations it can be available as part of the service account secret, in /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt. When using the OpenShift installer for GCP, mount the following configmap in the elastic-agent pod and use ca-bundle.crt in ssl.certificate_authorities:

        Name:         kubelet-serving-ca
        Namespace:    openshift-kube-apiserver
        Labels:       <none>
        Annotations:  <none>
        
        Data
        ====
        ca-bundle.crt:
  2. Grant the elastic-agent service account access to the privileged SCC:

    oc adm policy add-scc-to-user privileged system:serviceaccount:kube-system:elastic-agent

    This command enables the container to be privileged as an administrator for OpenShift.

  3. If the namespace where elastic-agent is running has the "openshift.io/node-selector" annotation set, elastic-agent might not run on all nodes. In this case consider overriding the node selector for the namespace to allow scheduling on any node:

    oc patch namespace kube-system -p \
    '{"metadata": {"annotations": {"openshift.io/node-selector": ""}}}'

    This command sets the node selector for the project to an empty string.

Autodiscover targeted Pods
edit

Refer to Kubernetes autodiscovery with Elastic Agent for more information.