Elasticsearchedit
The ARM template is able to deploy an Elasticsearch cluster topology with up to 50 data nodes and up to 20 coordinating nodes, along with three dedicated master nodes. The following sections highlight the parameters that can control node deployment options.
Subscription core quota limits
The template is able to deploy a cluster up to 73 nodes in size (3 master, 50 data and
20 coordinating nodes), but the largest cluster that you’ll be able to deploy will
be governed by the core quota limit defined for the VM SKU targeted and location
within the subscription. You can check what the limit and current use is with
Subscriptions > Usage + quotas
in the Azure portal or
Azure CLI 2.0.
az vm list-usage --location "<location>"
Azure PowerShell.
Get-AzureRmVMUsage -Location "<location>"
Typically, the default limit is 10 per VM SKU family per location. Contact Azure support to increase the limit for a VM SKU in a specific location.
Currently, the ARM template deploys only to Ubuntu 16.04-LTS VMs, using images published to the Azure VM gallery by Canonical, and the Debian package distribution of Elasticsearch. The template uses monit to manage and monitor the Elasticsearch process, with monit configured to run in daemon mode, checking the state of the Elasticsearch process every 30 seconds. If the process is found to not be running, monit will attempt to start the process.
Elasticsearch can be stopped with monit on an Elasticsearch VM node using
sudo monit stop elasticsearch
and started with
sudo monit start elasticsearch
Refer to monit documentation for further details.
Virtual machine admin usernameedit
All VMs deployed by the template are secured with a username and either a password or SSH key
-
adminUsername
- Admin username used when provisioning VMs. Must be a valid Linux username i.e. avoid any usernames that are invalid for Ubuntu
-
authenticationType
-
The authentication mechanism to use to access VMs. Can be either
password
orsshPublicKey
. -
adminPassword
-
When
authenticationType
ispassword
, the password to use for the Admin username to access VMs. -
sshPublicKey
-
When
authenticationType
issshPublicKey
, the public SSH key to use for the Admin username to access VMs.
Cluster settingsedit
The following are general settings that control cluster configuration
-
esVersion
-
The version of Elasticsearch (and thus, Kibana) to deploy. Each template version
is capable of deploying many different versions, with the template version number
giving an indication of what the default stack version will be. For example,
template
version 6.3.1 will deploy Elasticsearch 6.3.1 by default. Consult
esVersion.allowedValues
array in mainTemplate.json file of a specific template version to ascertain which versions it can deploy. -
esClusterName
- The name of the Elasticsearch cluster. It’s recommended to choose an appropriate name that describes the purpose of the cluster. This value is required
-
esHeapSize
-
The amount of memory, in megabytes, to allocate on each Elasticsearch node for the JVM heap. Default will allocate 50% of the available memory will be allocated to Elasticsearch heap, up to a maximum of 31,744MB (approximately 32GB).
This is an expert level feature; setting a heap size too low or larger than available memory on the chosen Elasticsearch VM SKU will fail the deployment.
-
esAdditionalYaml
-
Additional configuration for Elasticsearch yml configuration file. Each line must be separated by a
\n
newline character. For example,"action.auto_create_index: +.*\nindices.queries.cache.size: 5%"
It is recommended that you run your additional yaml through a linter before starting a deployment, as incorrectly formatted yaml will fail the deployment.
Data nodesedit
By default, the template deploys three data nodes. Data nodes hold and perform data related operations such as search and aggregations. Data node VMs are attached to the backend pool of load balancers within the template, unless coordinating nodes are also deployed, in which case coordinating nodes will be attached instead.
-
dataNodesAreMasterEligible
-
Either
Yes
orNo
to make data nodes master-eligible. This can be useful for small Elasticsearch clusters. For larger clusters however, it is recommended to have dedicated master nodes. The default isNo
, and whenYes
is passed, no dedicated master nodes will be provisioned. -
vmSizeDataNodes
-
The Azure VM SKU to use for data nodes. Different VM SKUs have different CPU, RAM,
temporary storage space and network bandwidth. Additionally, Different VM SKUs have
different limits to the number of managed disks that can be attached. The default is
Standard_D1
. -
vmDataNodeCount
- The number of data nodes. Must be greater than 0. Defaults to 3.
Master nodesedit
When dataNodesAreMasterEligible
parameter is No
, three dedicated master nodes
will be deployed. Dedicated master nodes are recommended for larger clusters.
-
vmSizeMasterNodes
-
The Azure VM SKU to use for dedicated master nodes.
Different VM SKUs have different CPU, RAM, temporary storage space and network
bandwidth. The default is
Standard_D1
.
Coordinating nodesedit
Coordinating nodes can optionally be deployed with the template; coordinating nodes do not hold data and are not master-eligible, but act as the coordinators of incoming requests from clients, sending those request on to data nodes, and gathering the results to reduce each data node’s results into a single global resultset. Coordinating nodes are a way to scale a cluster deployed with this template beyond 100 data nodes, the maximum number of VMs that can be added to a load balancer backend pool; although the template puts a limit of 50 data nodes within the template, this can be increased by forking the template and increasing this limit to 100.
If specified, coordinating node VMs are attached to the backend pool of load balancers within the template, instead of data node VMs.
-
vmSizeClientNodes
-
The Azure VM SKU to use for coordinating nodes.
Different VM SKUs have different CPU, RAM, temporary storage space and network
bandwidth. The default is
Standard_D1
. -
vmClientNodeCount
- The number of coordinating nodes. Defaults to 0.
Ingest and Machine learning nodesedit
All deployed nodes are configured as Ingest nodes, as well as Machine learning nodes, if a license that enables Machine learning features has been applied. Consult the node documentation to understand how node roles can be changed.
Scaling up number of nodesedit
The template deploys in incremental mode by default; If a previous solution deployment has been performed into the target resource group, the resources that exist in the resource group but are not in the template are left unchanged. All resources that are specified by the solution will be deployed, and for those resources that already exist and whose settings are unchanged, no change will be made. For those resources whose settings are changed however, the resource is provisioned with those new settings.
If the Elasticsearch deployment script is run on a VM that already has Elasticsearch process running, the elasticsearch.yml configuration file is changed using parameters from the new deployment. If the node is using the temporary disk for storage, the script ensures that the data directory and permissions are set appropriately. If a change to the elasticsearch configuration file is detected, the Elasticsearch process is restarted.
What incremental deployment mode and deployment script behaviour mean in practice is that it is possible to increase the size of a cluster deployed with the template. There are some caveats to be aware of
-
A deployment into an existing resource group where the template has already been
deployed must use exactly the same parameters, except either
vmDataNodeCount
orvmClientNodeCount
, which should be higher (or the same) as the previous deployment to the resource group, to increase the number of data or coordinating nodes, respectively. - Template deployment in incremental mode must only be used to scale up a cluster, and not down; the Azure infrastructure has no knowledge of which VMs can be safely deleted without losing data, since it knows nothing about the shards and replicas that each node contains.
- Scaling up should only be used when the cluster contains dedicated master nodes