How to configure Elastic Cloud on Kubernetes with SAML and hot-warm-cold architecture
Elastic Cloud on Kubernetes (ECK) is an easy way to get the Elastic Stack up and running on top of Kubernetes. That’s because ECK automates the deployment, provisioning, management, and setup of Elasticsearch, Kibana, Beats, and more.
As logging and metric data — or time series data — has a predictable lifespan, you can use hot, warm, and cold architecture to easily manage your data over time as it ages and becomes less relevant.
In this guide, we’ll explain how to set up SAML with auth0 as an identity provider (IdP) as well as how to configure your deployment for hot-warm-cold architecture with ECK. We’ll use production features such as:
- Dedicated nodes
- Zone awareness
- Pod and node affinity
- PodDisruption
- Dedicated storage class
Before you get started, you should be familiar with SAML. To learn more, check out this ECK example.
Scenario
For this example, we’ll deploy ECK to have a centralized way to authenticate into Kibana using auth0.
We’ll deploy the following resources:
- ECK Operator
- StorageClass
- Elasticsearch
- Elasticsearch role job
- Kibana
- Ingress controller
- ConfigMap SAML metadata
Architecture
To set up this environment, we’ll use GKE (Google Kubernetes Engine) with the following configurations:
Three Kubernetes nodes pools
- 1 hot node pool with 6 Kubernetes instances running spread across 3 availability zones
- 1 warm node pool with Kubernetes instances running spread across 3 availability zones
- 1 cold node pool with Kubernetes instances running spread across 3 availability zones
Instances configuration
- Hot nodes: c2-standard-4 (4 vCPUs, 16GB memory)
- Warm nodes: e2-standard-2 (2 vCPUs, 8GB memory)
- Cold nodes: e2-standard-2 (2 vCPUs, 8GB memory)
We’ll have nine Elasticsearch instances total (with individual tiers for hot, warm, and cold data, as well as dedicated masters), plus two Kibana instances running on GKE zones europe-west1-b
, europe-west1-c
, and europe-west1-d
.
For each GKE node pool, we’ll use a specific Kubernetes label to attach the Elasticsearch instance into the right hardware configuration. The label name is called type
, and the values are hot
, warm
, or cold
. You can change and add the label later by using the command line.
First, let’s dive into the manifest explanation, then how you can get set up.
Manifest explanation
The Elasticsearch manifest file eck-saml-hot-warm-cold.yml contains the Elasticsearch settings. Let’s examine the relevant parts of the manifest one by one:
nodeSets
We have one node called hot-zone-b
deployed at zone europe-west1-b
. Then, we define two routing awareness attributes:
k8s_node_name
: This attribute makes sure that Elasticsearch allocates primary and replica shards to pods running on different Kubernetes nodes and never to pods that are scheduled onto a single Kubernetes node.- zone: This attribute uses the Kubernetes label
domain.beta.kubernetes.io/zone
. If Elasticsearch knows which nodes are on the same zone, it can distribute the primary shard and its replica shards to minimise the risk of losing all shard copies in the event of a failure.
nodeSets: - name: hot-zone-b count: 1 config: node.attr.zone: europe-west1-b cluster.routing.allocation.awareness.attributes: k8s_node_name,zone node.roles: [ data_hot, data_content ] node.store.allow_mmap: false
We explicitly say that this node will have a data_hot
and data_content
role.
node.store.allow_mmap : false
will prevent the virtual address space (on Linux distribution) from running into errors or exceptions because this default configuration is too low. You can find more information in our documentation.
Xpack.security
This is the SAML configuration. You should follow this guide to configure auth0 as idp and then get the metadata file and the idp.* settings.
attributes.principal
: This defines which SAML attribute is going to be mapped to the principal (username) of the authenticated user in Kibana.idp.entity_id
: This is the SAML EntityID of your Identity Provider (which you can get from SAML metadata file).idp.metadata.path
: This is the file path or the HTTPs URL where your Identity Provider metadata is available. In this example, we’ll use a file to demonstrate how to mount a volume.sp.acs
: This is the Assertion Consumer Service URL where Kibana is listening for incoming SAML messages.sp.entity_id
: This is the SAML EntityID of our Service Provider.sp.logout
: This is theSingleLogout
endpoint where the Service Provider is listening for incoming SAMLLogoutResponse
andLogoutRequest
messages.
xpack: security: authc: realms: saml: saml1: attributes.principal: nameid idp.entity_id: urn:framsouza.eu.auth0.com idp.metadata.path: /usr/share/elasticsearch/config/framsouza_eu_auth0_com-metadata.xml order: 2 sp.acs: https://framsouza.co/api/security/v1/saml sp.entity_id: https://framsouza.co sp.logout: https://framsouza.co/logout
Keep in mind the sp.* configurations must point to the Kibana endpoint, and you must collect the idp.* settings from your Identity Provider.
Also, the SAML metadata is stored in a ConfigMap and mounted as a volume inside Elasticsearch. The idp also may provide the metadata via HTTP. In this case, auth0 provides the metadata file via HTTP, but the purpose is to show to you how to mount ConfigMap (or Secret) as a volume into Elasticsearch pods.
volumeClaimTemplates
The hot node will have a 50Gi available of disk, which refers to the storage class called sc-zone-b
. (We will define the storage class later.) This is the space available to store Elasticsearch data.
volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: sc-zone-b
Node and pod affinity
The affinity feature restricts scheduling pods in a group of Kubernetes nodes based on labels. Node affinity is conceptually similar to nodeSelector
that defines which node the pod will be scheduled based on the label. nodeAffinity
greatly extends the types of constraints you can express using enhancement labels.
nodeAffinity
is using a required DuringSchedulingIgnoredDuringExecution
affinity type. That means rules must be met for a pod to be scheduled on a node.
In this example, we'll define how to only run the pod on nodes in the zone europe-west1-b
. The nodeSelector
will be matched with a label on the Kubernetes node, which in this case is type: hot
.
podTemplate: spec: nodeSelector: type: hot affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: failure-domain.beta.kubernetes.io/zone operator: In values: - "europe-west1-b"
Containers definition
In this example, this node will have 8Gi of memory and is requesting at least 2 CPU cores. It will have 4Gi of heap. New Elasticsearch versions (starting with 7.11) automatically set the heap size based on the node roles and available memory. You can read more about this new approach in our docs.
containers: - name: elasticsearch resources: requests: memory: 8Gi cpu: 2 limits: memory: 8Gi
Creating a ConfigMap
Note: You must create this file before the Elasticsearch manifest.
Get the metadata file from your idp (in this case, auth0). To put the content inside a ConfigMap, you can run the following:
kubectl create configmap saml-metadata --from-file=../framsouza_eu_auth0_com-metadata.xml
Remember to adjust the file location.
Volume SAML metadata
The volumeMounts
session is where the volume should be mounted inside the pod. In this example, you are mounting a volume called saml-metadata
, and the file is located in ES_CONFIG
directory /usr/share/elasticsearch/config/framsouza_eu_auth0_com-metadata.xml
.
Then, volumes
means the volume must be mounted according to a ConfigMap called saml-metadata
.
volumeMounts: - name: saml-metadata mountPath: /usr/share/elasticsearch/config/framsouza_eu_auth0_com-metadata.xml volumes: - name: saml-metadata configMap: name: saml-metadata
readinessProbe
Readiness probe is used to know when a container is ready to start accepting traffic. The timeout is 3 seconds by default, which is acceptable in most cases, but if it’s under a very heavy load, you might need to increase the timeout. In this example, we’ll increase the timeout to 10 seconds and adjust the check time to 12 seconds.
readinessProbe: exec: command: - bash - -c - /mnt/elastic-internal/scripts/readiness-probe-script.sh failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 12 successThreshold: 1 timeoutSeconds: 12
initContainer
To install or perform any task at the operational system level before Elasticsearch starts, you should use an initContainer. In this example, we’ll install the GCS repository where we can send snapshots.
initContainers: - command: - sh - -c - | bin/elasticsearch-plugin install --batch repository-gcs name: install-plugins
You can also build your own image and include the GCS plugin. To do so, you can follow this blog. This approach is more production ready, so if you decide to go for it, you don’t need to specify the install-plugins
initContainer.
These are the sessions from one node. The rest of the node configuration is basically the same, except for the name, zone, and storageClass
name. At the end of the file, you see the following setting (which applies to the whole cluster):
updateStrategy: changeBudget: maxSurge: 1 maxUnavailable: 1
updateStrategy
controls the number of simultaneous changes in the Elasticsearch cluster. maxSurge: 1
means only one new pod is created at a time. After the first new pod is Ready, an old pod is killed and the second new pod is created.
If you don’t specify the strategy you want, the default behaviour is maxSurge: -1
, which means that all the required pods are created immediately. This may cause issues if you don’t have enough resources to deal with it. Learn more in our docs.
While maxSurge
determines how many new pods to create, maxUnavailable
determines how many old pods to kill. In this case, we can only kill one old pod at a time. This ensures the capacity is anywhere from 3 to 1 pod(s).
How to get set up
Install ECK
First let’s give your Google account administrator privileges on the cluster by setting up the role-based access control (RBAC). Run the following command:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud auth list --filter=status:ACTIVE --format="value(account)")
Install the ECK operator. Keep in mind we are installing ECK operator version 1.6.0, but you may want to check for updated versions on our site.
kubectl apply -f https://download.elastic.co/downloads/eck/1.6.0/all-in-one.yaml
This creates the ECK Operator pod and cluster permissions in the namespace elastic-system. You can monitor it by running:
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
Configuring StorageClass
With StorageClass
you can describe the "classes" of storage you want to use. Different classes might map to quality of service levels, disk types, backup purpose, or any arbitrary policy determined by the administrator.
As we are using hot-warm-cold architecture, we need to specify a StorageClass
for the hot and warm/cold nodes (as they must use different disk types) and associate it with the proper PersistentVolume
.
This is an example of StorageClass
that is attached to the PersistentVolume
in the hot nodes:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: sc-hot provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
Some key points:
provisioner: kubernetes.io/gce-pd
: This means we are using GCE as a provider.type: pd-ssd
: This provides SSD disks for the volumes attached to thisStorageClass
.reclaimPolicy: Delete
: This deletesPersistentVolumeClaim
resources if the owning Elasticsearch nodes are scaled down or a deletion.volumeBindingMode: WaitForFirstConsumer
: This prevents the pod from being scheduled, due to affinity settings, on a host where the boundPersistentVolume
is not available. It will also create it in the zone with the unfulfilled claim.
The warm and cold ones have the following configuration:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: sc-hot provisioner: kubernetes.io/gce-pd parameters: type: pd-standard reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
They are basically the same, apart from the disk type, which is pd-standard (a slow disk).
Create SAML ConfigMap
kubectl create configmap saml-metadata --from-file=../framsouza_eu_auth0_com-metadata.xml
Create Elasticsearch and Kibana resource
Once you have the SAML ConfigMap, you can apply the Elasticsearch and Kibana manifest:
kubectl create -f eck-saml-hot-warm-cold.yml && kubectl create -f kibana.yml
It may take a while until all the resources get ready, but after a few minutes, you should see something like this:
kubectl get pods NAME READY STATUS RESTARTS AGE elastic-prd-es-cold-zone-b-0 1/1 Running 0 35m elastic-prd-es-cold-zone-c-0 1/1 Running 0 35m elastic-prd-es-cold-zone-d-0 1/1 Running 0 35m elastic-prd-es-hot-zone-b-0 1/1 Running 0 35m elastic-prd-es-hot-zone-c-0 1/1 Running 0 35m elastic-prd-es-hot-zone-d-0 1/1 Running 0 35m elastic-prd-es-master-zone-b-0 1/1 Running 0 35m elastic-prd-es-master-zone-c-0 1/1 Running 0 35m elastic-prd-es-master-zone-d-0 1/1 Running 0 35m elastic-prd-es-warm-zone-b-0 1/1 Running 0 16m elastic-prd-es-warm-zone-c-0 1/1 Running 0 16m elastic-prd-es-warm-zone-d-0 1/1 Running 0 15m kibana-prd-kb-7467b79f54-btzhq 1/1 Running 0 4m8s
Once you have all the pods running, there's a Job you must run to create the SAML role mapping inside of Elasticsearch and give the right permission to the user who will log in using SAML. You can also create the role and role mapping via Kibana DevTools or curl.
If you try to access Kibana via SAML without running this Job, you will get a permission error. In summary, the job will spin up a container, execute the API calls and kill the pod.
Test connection
At this point you can test the connection via curl or exposing Kibana service.
Grab the Elastic password:
kubectl get secret elastic-prd-es-elastic-user -o yaml
And the Elasticsearch service name and run the following:
curl -k https://elastic:es-password@es-service-name8:9200/_cluster/health?pretty
You can also temporarily expose the Kibana service and access it with your browser:
kubectl port-forward svc/kibana-prd-kb-http 5601
Ingress controller
By using Ingress controller, you can access your Kibana via domain controller. There are some Ingress controllers available, but in this example we are using ingress-nginx
. You can read more about it on GitHub.
To deploy it, you must run the following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.46.0/deploy/static/provider/cloud/deploy.yaml
In this example, we are using a valid SSL certificate. To add it as part of our Ingress controller, we must first create a Secret, which contains the key and certificate.
kubectl create secret tls framsouza-cert --key framsouza_co_key.txt --cert framsouza_co.crt
Once you created the Secret, you can deploy the Ingress controller manifest. But first let’s have a look at the Ingress manifest:
apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/backend-protocol: "HTTPS" name: ingress spec: tls: - hosts: - framsouza.co secretName: framsouza-cert rules: - host: framsouza.co http: paths: - path: / backend: serviceName: kibana-prd-kb-http servicePort: 5601
At the annotations level, we are explicitly saying that we are using nginx
as a controller and configuring the communication between the Ingress controller and the service to be established via HTTP (nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
). By the default, Elasticsearch uses HTTPS, and without this annotation you may receive a 503 error.
At the TLS level, we are defining the hostname (In this case framsouza.co
) to check the secretname
called framsouza-cert
, which is the Secret that contains the SSL certificate.
At the rules level, we are creating a redirection. Every request that arrives at the domain http://framsouza.co
must be redirected to the Kibana service (kibana-prd-kb-http
) at the port 5601.
In some cases, you also want to expose Elasticsearch to be accessed by some application using your own SSL certificate. To do so you can add a new rule like:
- path: /elasticsearch backend: serviceName: elastic-prd-es-http servicePort: 9200
Here, every request to https://framsouza.co/elasticsearch
will be redirected to the Elasticsearch service.
Now, we are ready to apply the Ingress manifest:
kubectl create -f ingress.yml
With that, we can access Kibana using our own domain and our own TLS certificate. This also covers the most important features to run an ECK environment in a production environment. Be sure to check out the repo on GitHub.
Getting started
Interested in trying it out? Simply follow these easy steps to install Elastic Cloud on Kubernetes. Be sure to connect with other users in the Elastic community or reach out on our Discuss forums.