Azure repository
editAzure repository
editYou can use Azure Blob storage as a repository for Snapshot and restore.
Setup
editTo enable Azure repositories, first configure an Azure repository client by
specifying one or more settings of the form
azure.client.CLIENT_NAME.SETTING_NAME
. By default, azure
repositories use a
client named default
, but you may specify a different client name when
registering each repository.
The only mandatory Azure repository client setting is account
, which is a
secure setting defined in the Elasticsearch
keystore. To provide this setting, use the elasticsearch-keystore
tool on
each node:
bin/elasticsearch-keystore add azure.client.default.account
If you adjust this setting after a node has started, call the Nodes reload secure settings API to reload the new value.
You may define more than one client by setting their account
values. For
instance, to set the default
client and another client called secondary
, run
the following commands on each node:
bin/elasticsearch-keystore add azure.client.default.account bin/elasticsearch-keystore add azure.client.secondary.account
The key
and sas_token
settings are also secure settings and can be set using
commands like the following:
bin/elasticsearch-keystore add azure.client.default.key bin/elasticsearch-keystore add azure.client.secondary.sas_token
Other Azure repository client settings must be set in elasticsearch.yml
before
the node starts. For example:
azure.client.default.timeout: 10s azure.client.default.max_retries: 7 azure.client.default.endpoint_suffix: core.chinacloudapi.cn azure.client.secondary.timeout: 30s
In this example, the client side timeout is 10s
per try for repositories which
use the default
client, with 7
retries before failing and an endpoint
suffix of core.chinacloudapi.cn
. Repositories which use the secondary
client
will have a timeout of 30s
per try, but will use the default endpoint and will
fail after the default number of retries.
Once an Azure repository client is configured correctly, register an Azure
repository as follows, providing the client name using the client
repository setting:
resp = client.snapshot.create_repository( name="my_backup", repository={ "type": "azure", "settings": { "client": "secondary" } }, ) print(resp)
const response = await client.snapshot.createRepository({ name: "my_backup", repository: { type: "azure", settings: { client: "secondary", }, }, }); console.log(response);
PUT _snapshot/my_backup { "type": "azure", "settings": { "client": "secondary" } }
If you are using the default
client, you may omit the client
repository
setting:
resp = client.snapshot.create_repository( name="my_backup", repository={ "type": "azure" }, ) print(resp)
const response = await client.snapshot.createRepository({ name: "my_backup", repository: { type: "azure", }, }); console.log(response);
PUT _snapshot/my_backup { "type": "azure" }
In progress snapshot or restore jobs will not be preempted by a reload of the storage secure settings. They will complete using the client as it was built when the operation started.
Client settings
editThe following list describes the available client settings. Those that must be
stored in the keystore are marked as (Secure,
reloadable); the other
settings must be stored in the elasticsearch.yml
file. The default
CLIENT_NAME
is default
but you may configure a client with a different name
and specify that client by name when registering a repository.
-
azure.client.CLIENT_NAME.account
(Secure, reloadable) - The Azure account name, which is used by the repository’s internal Azure client. This setting is required for all clients.
-
azure.client.CLIENT_NAME.endpoint_suffix
-
The Azure endpoint suffix to connect to. The default value is
core.windows.net
. -
azure.client.CLIENT_NAME.key
(Secure, reloadable) -
The Azure secret key, which is used by the repository’s internal Azure client.
Alternatively, use
sas_token
. -
azure.client.CLIENT_NAME.max_retries
-
The number of retries to use when an Azure request fails. This setting helps
control the exponential backoff policy. It specifies the number of retries
that must occur before the snapshot fails. The default value is
3
. The initial backoff period is defined by Azure SDK as30s
. Thus there is30s
of wait time before retrying after a first timeout or failure. The maximum backoff period is defined by Azure SDK as90s
. -
azure.client.CLIENT_NAME.proxy.host
- The host name of a proxy to connect to Azure through. By default, no proxy is used.
-
azure.client.CLIENT_NAME.proxy.port
- The port of a proxy to connect to Azure through. By default, no proxy is used.
-
azure.client.CLIENT_NAME.proxy.type
-
Register a proxy type for the client. Supported values are
direct
,http
, andsocks
. For example:azure.client.default.proxy.type: http
. Whenproxy.type
is set tohttp
orsocks
,proxy.host
andproxy.port
must also be provided. The default value isdirect
. -
azure.client.CLIENT_NAME.sas_token
(Secure, reloadable) -
A shared access signatures (SAS) token, which the repository’s internal Azure
client uses for authentication. The SAS token must have read (r), write (w),
list (l), and delete (d) permissions for the repository base path and all its
contents. These permissions must be granted for the blob service (b) and apply
to resource types service (s), container (c), and object (o). Alternatively,
use
key
. -
azure.client.CLIENT_NAME.timeout
-
The client side timeout for any single request to Azure, as a
time unit. For example, a value of
5s
specifies a 5 second timeout. There is no default value, which means that Elasticsearch uses the default value set by the Azure client. -
azure.client.CLIENT_NAME.endpoint
- The Azure endpoint to connect to. It must include the protocol used to connect to Azure.
-
azure.client.CLIENT_NAME.secondary_endpoint
- The Azure secondary endpoint to connect to. It must include the protocol used to connect to Azure.
Obtaining credentials from the environment
If you specify neither the key
nor the sas_token
settings for a client then
Elasticsearch will attempt to automatically obtain credentials from the environment in
which it is running using mechanisms built into the Azure SDK. This is ideal
for when running Elasticsearch on the Azure platform.
When running Elasticsearch on an Azure Virtual Machine, you should use Azure Managed Identity to provide credentials to Elasticsearch. To use Azure Managed Identity, assign a suitably authorized identity to the Azure Virtual Machine on which Elasticsearch is running.
When running Elasticsearch in
Azure Kubernetes
Service, for instance using Elastic Cloud on Kubernetes, you should use
Azure
Workload Identity to provide credentials to Elasticsearch. To use Azure Workload
Identity, mount the azure-identity-token
volume as a subdirectory of the
Elasticsearch config directory and set the
AZURE_FEDERATED_TOKEN_FILE
environment variable to point to a file called
azure-identity-token
within the mounted volume.
The Azure SDK has several other mechanisms to automatically obtain credentials from its environment, but the two methods described above are the only ones that are tested and supported for use in Elasticsearch.
Repository settings
editThe Azure repository supports the following settings, which may be specified when registering an Azure repository as follows:
resp = client.snapshot.create_repository( name="my_backup", repository={ "type": "azure", "settings": { "client": "secondary", "container": "my_container", "base_path": "snapshots_prefix" } }, ) print(resp)
const response = await client.snapshot.createRepository({ name: "my_backup", repository: { type: "azure", settings: { client: "secondary", container: "my_container", base_path: "snapshots_prefix", }, }, }); console.log(response);
PUT _snapshot/my_backup { "type": "azure", "settings": { "client": "secondary", "container": "my_container", "base_path": "snapshots_prefix" } }
-
client
-
The name of the Azure repository client to use. Defaults to
default
. -
container
-
Container name. You must create the azure container before creating the repository.
Defaults to
elasticsearch-snapshots
. -
base_path
-
Specifies the path within container to repository data. Defaults to empty (root directory).
Don’t set
base_path
when configuring a snapshot repository for Elastic Cloud Enterprise. Elastic Cloud Enterprise automatically generates thebase_path
for each deployment so that multiple deployments may share the same bucket. -
chunk_size
-
Big files can be broken down into multiple smaller blobs in the blob store
during snapshotting. It is not recommended to change this value from its
default unless there is an explicit reason for limiting the size of blobs in
the repository. Setting a value lower than the default can result in an
increased number of API calls to the Azure blob store during snapshot create
as well as restore operations compared to using the default value and thus
make both operations slower as well as more costly. Specify the chunk size
as a byte unit, for example:
10MB
,5KB
,500B
. Defaults to the maximum size of a blob in the Azure blob store which is5TB
. -
compress
-
When set to
true
metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults totrue
. -
max_restore_bytes_per_sec
- (Optional, byte value) Maximum snapshot restore rate per node. Defaults to unlimited. Note that restores are also throttled through recovery settings.
-
max_snapshot_bytes_per_sec
-
(Optional, byte value)
Maximum snapshot creation rate per node. Defaults to
40mb
per second. Note that if the recovery settings for managed services are set, then it defaults to unlimited, and the rate is additionally throttled through recovery settings.
-
readonly
-
(Optional, Boolean) If
true
, the repository is read-only. The cluster can retrieve and restore snapshots from the repository but not write to the repository or create snapshots in it.Only a cluster with write access can create snapshots in the repository. All other clusters connected to the repository should have the
readonly
parameter set totrue
.If
false
, the cluster can write to the repository and create snapshots in it. Defaults tofalse
.If you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. Having multiple clusters write to the repository at the same time risks corrupting the contents of the repository.
-
location_mode
-
primary_only
orsecondary_only
. Defaults toprimary_only
. Note that if you set it tosecondary_only
, it will forcereadonly
to true. -
delete_objects_max_size
-
(integer) Sets the maxmimum batch size, betewen 1 and 256, used for
BlobBatch
requests. Defaults to 256 which is the maximum number supported by the Azure blob batch API. -
max_concurrent_batch_deletes
-
(integer) Sets the maximum number of concurrent batch delete requests that will be submitted for any individual bulk delete with
BlobBatch
. Note that the effective number of concurrent deletes is further limited by the Azure client connection and event loop thread limits. Defaults to 10, minimum is 1, maximum is 100.
Repository validation rules
editAccording to the containers naming guide, a container name must be a valid DNS name, conforming to the following naming rules:
- Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
- Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
- All letters in a container name must be lowercase.
- Container names must be from 3 through 63 characters long.
Supported Azure Storage Account types
The Azure repository type works with all Standard storage accounts
-
Standard Locally Redundant Storage -
Standard_LRS
-
Standard Zone-Redundant Storage -
Standard_ZRS
-
Standard Geo-Redundant Storage -
Standard_GRS
-
Standard Read Access Geo-Redundant Storage -
Standard_RAGRS
Premium Locally Redundant Storage (Premium_LRS
) is not supported as it is only usable as VM disk storage, not as general storage.
Linearizable register implementation
editThe linearizable register implementation for Azure repositories is based on Azure’s support for strongly consistent leases. Each lease may only be held by a single node at any time. The node presents its lease when performing a read or write operation on a protected blob. Lease-protected operations fail if the lease is invalid or expired. To perform a compare-and-exchange operation on a register, Elasticsearch first obtains a lease on the blob, then reads the blob contents under the lease, and finally uploads the updated blob under the same lease. This process ensures that the read and write operations happen atomically.