Elasticsearchedit

The ARM template is able to deploy an Elasticsearch cluster topology with up to 50 data nodes and up to 20 coordinating nodes, along with three dedicated master nodes. The following sections highlight the parameters that can control node deployment options.

Subscription core quota limits

The template is able to deploy a cluster up to 73 nodes in size (3 master, 50 data and 20 coordinating nodes), but the largest cluster that you’ll be able to deploy will be governed by the core quota limit defined for the VM SKU targeted and location within the subscription. You can check what the limit and current use is with Subscriptions > Usage + quotas in the Azure portal or

Azure CLI 2.0.

az vm list-usage --location "<location>"

Azure PowerShell.

Get-AzureRmVMUsage -Location "<location>"

Typically, the default limit is 10 per VM SKU family per location. Contact Azure support to increase the limit for a VM SKU in a specific location.

Currently, the ARM template deploys only to Ubuntu 16.04-LTS VMs, using images published to the Azure VM gallery by Canonical, and the Debian package distribution of Elasticsearch. The template uses systemd to run the Elasticsearch process, with Elasticsearch configured to start automatically when the system boots up.

Elasticsearch can be stopped with systemd on an Elasticsearch VM node using

sudo systemctl stop elasticsearch.service

and started with

sudo systemctl start elasticsearch.service

Virtual machine admin usernameedit

All VMs deployed by the template are secured with a username and either a password or SSH key

adminUsername
Admin username used when provisioning VMs. Must be a valid Linux username i.e. avoid any usernames that are invalid for Ubuntu
authenticationType
The authentication mechanism to use to access VMs. Can be either password or sshPublicKey.
adminPassword
When authenticationType is password, the password to use for the Admin username to access VMs.
sshPublicKey
When authenticationType is sshPublicKey, the public SSH key to use for the Admin username to access VMs.

Cluster settingsedit

The following are general settings that control cluster configuration

esVersion
The version of Elasticsearch (and thus, Kibana) to deploy. Each template version is capable of deploying many different versions, with the template version number giving an indication of what the default stack version will be. For example, template version 7.3.1 will deploy Elasticsearch 7.3.1 by default. Consult esVersion.allowedValues array in mainTemplate.json file of a specific template version to ascertain which versions it can deploy.
esClusterName
The name of the Elasticsearch cluster. It’s recommended to choose an appropriate name that describes the purpose of the cluster. This value is required
esHeapSize

The amount of memory, in megabytes, to allocate on each Elasticsearch node for the JVM heap. Default will allocate 50% of the available memory will be allocated to Elasticsearch heap, up to a maximum of 31,744MB (approximately 32GB).

This is an expert level feature; setting a heap size too low or larger than available memory on the chosen Elasticsearch VM SKU will fail the deployment.

esAdditionalYaml

Additional configuration for Elasticsearch yml configuration file. Each line must be separated by a \n newline character. For example,

"action.auto_create_index: +.*\nindices.queries.cache.size: 5%"

It is recommended that you run your additional yaml through a linter before starting a deployment, as incorrectly formatted yaml will fail the deployment.

Data nodesedit

By default, the template deploys three data nodes. Data nodes hold and perform data related operations such as search and aggregations. Data node VMs are attached to the backend pool of load balancers within the template, unless coordinating nodes are also deployed, in which case coordinating nodes will be attached instead.

dataNodesAreMasterEligible
Either Yes or No to make data nodes master-eligible. This can be useful for small Elasticsearch clusters. For larger clusters however, it is recommended to have dedicated master nodes. The default is No, and when Yes is passed, no dedicated master nodes will be provisioned.
vmSizeDataNodes
The Azure VM SKU to use for data nodes. Different VM SKUs have different CPU, RAM, temporary storage space and network bandwidth. Additionally, Different VM SKUs have different limits to the number of managed disks that can be attached. The default is Standard_D1.
vmDataNodeCount
The number of data nodes. Must be greater than 0. Defaults to 3.
vmDataNodeAcceleratedNetworking
Whether to enable accelerated networking for data nodes, which enables single root I/O virtualization (SR-IOV) to a VM, greatly improving its networking performance. Valid values are Default, Yes, No. The default is Default, which enables accelerated networking for the VM SKUs known to support it.

Master nodesedit

When dataNodesAreMasterEligible parameter is No, three dedicated master nodes will be deployed. Dedicated master nodes are recommended for larger clusters.

vmSizeMasterNodes
The Azure VM SKU to use for dedicated master nodes. Different VM SKUs have different CPU, RAM, temporary storage space and network bandwidth. The default is Standard_D1.
vmMasterNodeAcceleratedNetworking
Whether to enable accelerated networking for dedicated master nodes, which enables single root I/O virtualization (SR-IOV) to a VM, greatly improving its networking performance. Valid values are Default, Yes, No. The default is Default, which enables accelerated networking for the VM SKUs known to support it.

Coordinating nodesedit

Coordinating nodes can optionally be deployed with the template; coordinating nodes do not hold data and are not master-eligible, but act as the coordinators of incoming requests from clients, sending those request on to data nodes, and gathering the results to reduce each data node’s results into a single global resultset. Coordinating nodes are a way to scale a cluster deployed with this template beyond 100 data nodes, the maximum number of VMs that can be added to a load balancer backend pool; although the template puts a limit of 50 data nodes within the template, this can be increased by forking the template and increasing this limit to 100.

If specified, coordinating node VMs are attached to the backend pool of load balancers within the template, instead of data node VMs.

vmSizeClientNodes
The Azure VM SKU to use for coordinating nodes. Different VM SKUs have different CPU, RAM, temporary storage space and network bandwidth. The default is Standard_D1.
vmClientNodeCount
The number of coordinating nodes. Defaults to 0.
vmClientNodeAcceleratedNetworking
Whether to enable accelerated networking for coordinating nodes, which enables single root I/O virtualization (SR-IOV) to a VM, greatly improving its networking performance. Valid values are Default, Yes, No. The default is Default, which enables accelerated networking for the VM SKUs known to support it.

Ingest and Machine learning nodesedit

All deployed nodes are configured as Ingest nodes, as well as Machine learning nodes, if a license that enables Machine learning features has been applied. Consult the node documentation to understand how node roles can be changed.

Scaling up number of nodesedit

The template deploys in incremental mode by default; If a previous solution deployment has been performed into the target resource group, the resources that exist in the resource group but are not in the template are left unchanged. All resources that are specified by the solution will be deployed, and for those resources that already exist and whose settings are unchanged, no change will be made. For those resources whose settings are changed however, the resource is provisioned with those new settings.

If the Elasticsearch deployment script is run on a VM that already has Elasticsearch process running, the elasticsearch.yml configuration file is changed using parameters from the new deployment. If the node is using the temporary disk for storage, the script ensures that the data directory and permissions are set appropriately. If a change to the elasticsearch configuration file is detected, the Elasticsearch process is restarted.

What incremental deployment mode and deployment script behaviour mean in practice is that it is possible to increase the size of a cluster deployed with the template. There are some caveats to be aware of

  1. A deployment into an existing resource group where the template has already been deployed must use exactly the same parameters, except either vmDataNodeCount or vmClientNodeCount, which should be higher (or the same) as the previous deployment to the resource group, to increase the number of data or coordinating nodes, respectively.
  2. Template deployment in incremental mode must only be used to scale up a cluster, and not down; the Azure infrastructure has no knowledge of which VMs can be safely deleted without losing data, since it knows nothing about the shards and replicas that each node contains.
  3. Scaling up should only be used when the cluster contains dedicated master nodes