Migrating from AWS Elasticsearch to Elasticsearch Service on Elastic Cloud
Access all of the features of the Elastic Stack
As a Solutions Architect, I often get asked how to move an Elastic deployment from the Amazon Elasticsearch Service (AWS ES) to the Elasticsearch Service. I’m primarily asked because users want to take advantage of all of the features, operational expertise and support provided by Elastic that is not available from Amazon. This practitioner's guide will walk you through the proverbial “lift-and-shift” to the Elasticsearch Service from Elastic.
To get started with the Elasticsearch Service, there is a free 14-day trial that allows you to create a deployment. You can pick the cloud provider, AWS, GCP, or Azure and the region where you’d like Elastic to run your deployment. AWS users can add the Elasticsearch service directly from the AWS marketplace, integrated into your AWS bill.
There are quite a few capabilities that go beyond what’s available in the open source distribution, like Canvas, APM, unsupervised machine learning, frozen indices, SQL, security (beyond basic IAM policies and perimeter-only) and deployment templates that are unique to the Elasticsearch Service on Elastic Cloud. We’re adding more unique capabilities all the time. To learn more about what we do relative to AWS ES, check out our AWS Elasticsearch comparison page from time to time.
Migrating from AWS Elasticsearch to the Elasticsearch Service on Elastic Cloud
This is a fairly technical guide for migrating from AWS ES to the Elasticsearch Service on Elastic Cloud and requires some programming experience. AWS ES clusters are commonly provisioned into a VPC, but they can also be put on a public-facing endpoint. In order to keep this guide universal for both scenarios, we use the Python AWS SDK. You can use any language that has an AWS SDK (e.g., Java, Ruby, Go, etc.) but we only provide examples in Python below.
There are two parts to this guide:
Note: If you already have your AWS ES cluster manually snapshotted to S3, you can skip to part two.
Before we begin, it’s important to understand some of the IAM security steps that follow. First, in order to snapshot an AWS ES cluster into S3, your AWS ES cluster needs permission to write to a private S3 bucket. This requires an IAM role and policy that have the necessary permissions. Also, we’ll need to attach an IAM policy to an IAM user (creating one if necessary). The IAM user is used by our script to talk to your AWS ES cluster and by your Elastic-managed deployment to read the snapshot from your S3 bucket.
Part one – Snapshot to S3
The first part of this guide involves setting up an IAM role, policy, and user to snapshot your AWS ES cluster to S3. The AWS documentation for this process can be found here: Working with Amazon Elasticsearch Service Index Snapshots. It may help as a reference if you get stuck.
You’ll need to note several variables we’ll be using along the way. Copy and paste the following table to a notes file, where you can reference it throughout this guide. This will make it easy to fill in the values specific for your migration.
Description | Variable | Value |
AWS ES Domain ARN | DOMAIN_ARN | |
AWS ES Endpoint URL | ES_ENDPOINT | |
AWS ES Region | ES_REGION | |
AWS S3 Bucket Name | S3_BUCKET_NAME | |
AWS S3 Region | S3_REGION_NAME | |
AWS IAM Role ARN | ROLE_ARN | |
AWS IAM Access Key ID | ACCESS_KEY | |
AWS IAM Secret Access Key | SECRET_KEY | |
AWS ES Snapshot Repository | SNAPSHOT_REPO | my-snapshot-repo |
AWS ES Snapshot Name | SNAPSHOT_NAME | my-snapshot |
You can change the values of SNAPSHOT_REPO and SNAPSHOT_NAME or use the examples provided (i.e., “my-snapshot-repo” and “my-snapshot”).
Step 1 - Get your AWS ES info
We’ll need some basic information about your AWS ES cluster to snapshot it to S3.
- In your AWS Console, go to the Elasticsearch Service
- Select the domain of the cluster you want to snapshot
- Copy the “Domain ARN” value to your notes file (DOMAIN_ARN)
- Copy the “Endpoint” URL value to your notes file (ES_ENDPOINT)
- Note which AWS region (e.g., us-east-1) your AWS ES cluster is in (ES_REGION)
This information is used below in the IAM policy creation and when it’s time to issue commands to the cluster.
Step 2 - Create an AWS S3 bucket
We need to create an S3 bucket to store your snapshot.
Important: Your S3 bucket must be in the same region as your AWS ES cluster. You will be able to restore from there to an Elastic-managed deployment in any region or cloud provider (AWS, GCP, or Azure).
- In your AWS Console, go to the S3 service
- Create a private S3 bucket
Note: If you leave the defaults, your bucket will be private and secure - Copy the name of the bucket to your notes file (S3_BUCKET_NAME)
- Copy the region of the bucket to your notes file (S3_REGION_NAME)
This information is used when we register a snapshot repository with Elasticsearch.
Step 3 - Create an IAM role
Next we’ll create a role to delegate permission to Amazon Elasticsearch Service to take a snapshot into S3.
- In your AWS Console, go to the IAM service
- Select “Roles”
- Select “Create role”
- Select “EC2” as the service that will use this new role (we will change it later)
- Select “Next: Permissions”
- Leave the policies on the role empty for now
- Select “Next: Tags”
- Select “Next: Review”
- Name the role: TheSnapshotRole
- Select “Create role”
- From the list of roles, select the one we just created: TheSnapshotRole
- Select “Trust relationships”
- Select “Edit trust relationship”
- Copy and paste the following into the trust relationship (replacing what’s there)
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "es.amazonaws.com" }, "Action": "sts:AssumeRole" }] }
- Select “Update Trust Policy”
- Select “Permissions”
- Select “Add inline policy”
- Select the JSON tab
- Copy and paste the following JSON (replacing what’s there)
- Replace S3_BUCKET_NAME with the correct value (in two places)
{ "Version": "2012-10-17", "Statement": [{ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME" ] }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME/*" ] } ] }
- Select “Review policy”
- Name the policy: TheSnapshotS3Policy
- Select “Create policy”
- Copy the “Role ARN” value to your notes file (ROLE_ARN)
We just created an IAM role with an inline policy that can read & write to your S3 bucket.
Step 4 - Create an IAM policy
We need to create a new IAM policy that has permission to assume the role above in order to register the snapshot repository.
- In your AWS Console, go to the IAM service
- Select “Policies”
- Select “Create policy”
- Select the JSON tab
- Copy and paste the following JSON (replacing what’s there)
- Replace ROLE_ARN with the correct value
- Replace DOMAIN_ARN with the correct value
- Replace S3_BUCKET_NAME with the correct value (in 2 places)
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "ROLE_ARN" }, { "Effect": "Allow", "Action": "es:ESHttpPut", "Resource": "DOMAIN_ARN/*" } ] }
- Select “Review policy”
- Name the policy: TheSnapshotPolicy
- Select “Create policy”
We just created an IAM policy that allows the IAM role to talk to your AWS ES domain.
Step 5 - Create an IAM user
If you don’t already have an IAM user, we’ll need to create one and give it access to your private S3 bucket. If you do have an IAM user, you can simply attach the following IAM policy to it.
- In your AWS Console, go to the IAM service
- Select “Users”
- Select “Add user”
- Name the user: TheSnapshotUser
- Check the box “Programmatic access”
- Select “Next: Permissions”
- Select the box “Attach existing policies directly”
- Filter the policies by typing in “TheSnapshot”
- Select the checkbox next to the policy “TheSnapshotPolicy”
- Select “Next: Tags”
- Select “Next: Review”
- Select “Create user”
- Copy the “Access key ID” value to your notes file (ACCESS_KEY)
- Select “Show” under “Secret access key”
- Copy the “Secret access key” value to your notes file (SECRET_KEY)
- Select “Close”
- From the list of users, select the one we just created: TheSnapshotUser
- Select “Add inline policy”
- Select the JSON tab
- Copy and paste the following JSON (replacing what’s there)
- Replace S3_BUCKET_NAME with the correct value (in 2 places)
{ "Version": "2012-10-17", "Statement": [{ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME" ] }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME/*" ] } ] }
- Select “Review policy”
- Name the policy: TheSnapshotUserS3Policy
- Select “Create policy”
We just created an IAM user that can take a manual snapshot and read from that snapshot.
Step 6 - Configure the Python AWS SDK
Before we run a manual snapshot, we need to register a snapshot repository with your deployment. This requires sending a signed request to your AWS ES cluster. One of the easiest ways to do this is with the Python AWS SDK. You can use another AWS SDK (e.g., Java, Ruby, Go, etc.) but the example below uses the Python AWS SDK.
We’ll install the Python AWS SDK using Python’s package installer PIP (pip3
). This requires Python v3 to be installed. If you don’t have Python v3 installed, you can get it by just installing pip3
. Your operating system’s package manager will install Python v3 automatically, since it’s a dependency to pip3
. If you get stuck, refer to the Python installation docs.
Installing pip3
To install pip3
on Red Hat and derivatives, use yum:
$ sudo yum -y install python3-pip
Alternatively, some Fedora distributions label the pip3
package differently:
$ sudo yum -y install python36-pip
You can search for it if neither package name works above:
$ yum search pip
On Debian derivatives such as Ubuntu, use apt-get:
$ sudo apt-get -y install python3-pip
Installing the Python AWS SDK
Once pip3
is installed, you can install the Python AWS SDK called boto3
:
$ pip3 install --user boto3 requests_aws4auth Collecting boto3 ... Successfully installed boto3-1.9.106 requests-aws4auth-0.9 ...
Note: No root access is needed if you specify the --user
flag.
We need to create an ~/.aws
directory to hold our AWS credentials
. Run the following command to create the directory:
$ mkdir ~/.aws
Create a file called credentials
with your favorite editor. We’ll use nano
for simplicity:
$ nano ~/.aws/credentials
Copy and paste the following contents into the file, replacing the 2 uppercase variables.
[default] aws_access_key_id = ACCESS_KEY aws_secret_access_key = SECRET_KEY
Use ctrl+x to exit nano
, and follow the prompts to save the file.
Next, we’ll write a few Python scripts to perform the tasks we need.
Step 7 - Manually snapshot AWS ES
Let’s run a quick test using a Python script to list the indices in our AWS ES cluster. This will ensure our AWS credentials are working and prove we can talk to the cluster.
Create a file called indices.py
with your favorite editor. We’ll use nano
for simplicity:
$ nano indices.py
Copy and paste the following contents, replacing the two uppercase variables with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Listing Indices from AWS ES ...") req = requests.get(host + '/_cat/indices?v', auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Use ctrl+x to exit nano
, and follow the prompts to save the file.
Run the Python script.
$ python3 indices.py
Your output should look similar to the following:
Listing Indices from AWS ES ... HTTP Response Code: 200 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open testindex yME2BphgR3Gt1ln6n03nHQ 5 1 1 0 4.4kb 4.4kb
Now create a file called register.py
with your favorite editor.
$ nano register.py
Copy and paste the following contents, replacing the seven uppercase variables with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' s3_region_name = 'S3_REGION_NAME' s3_bucket_name = 'S3_BUCKET_NAME' role_arn = 'ROLE_ARN' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) headers = {"Content-Type": "application/json"} payload = { "type": "s3", "settings": { "region": s3_region_name, "bucket": s3_bucket_name, "role_arn": role_arn } } print("Registering Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name req = requests.put(url, auth=auth, json=payload, headers=headers) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Use ctrl+x to exit nano
, and follow the prompts to save the file.
Run the Python script.
$ python3 register.py
Your output should look similar to the following:
Registering Snapshot with AWS ES ... HTTP Response Code: 200 {"acknowledged":true}
Next, create a file called snapshot.py
with your favorite editor.
$ nano snapshot.py
Copy and paste the following contents, replacing the four upper variables with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Starting Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name + '/' + snapshot_name req = requests.put(url, auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Use ctrl+x to exit nano
, and follow the prompts to save the file.
Run the Python script.
$ python3 snapshot.py
Your output should look similar to the following:
Starting Snapshot with AWS ES ... HTTP Response Code: 200 {"accepted":true}
Note: The time required to take a snapshot increases with the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a “504 GATEWAY_TIMEOUT”. Their docs say that you can ignore this error and just wait for the snapshot to complete successfully.
Finally, let’s check the status of our snapshot. Create a file called status.py
.
$ nano status.py
Copy and paste the following contents, replacing the four upper variables with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Getting Status of Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name + '/' + snapshot_name + '?pretty' req = requests.get(url, auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Use ctrl+x to exit nano
, and follow the prompts to save the file.
Run the Python script.
$ python3 status.py
Your output should look similar to the following:
Getting Status of Snapshot with AWS ES ... HTTP Response Code: 200 { "snapshots" : [ { "snapshot" : "my-snapshot", "uuid" : "ClYKt5g8QFO6r3kTCEzjqw", "version_id" : 6040299, "version" : "6.4.2", "indices" : [ "testindex" ], "include_global_state" : true, "state" : "SUCCESS", "start_time" : "2019-03-03T14:46:04.094Z", "start_time_in_millis" : 1551624364094, "end_time" : "2019-03-03T14:46:04.847Z", "end_time_in_millis" : 1551624364847, "duration_in_millis" : 753, "failures" : [ ], "shards" : { "total" : 5, "failed" : 0, "successful" : 5 } } ] }
If you see "state":"SUCCESS"
then you have successfully taken a snapshot to S3 and are ready for part two!
Part two – Restore from S3
The second part of this guide involves restoring an Elastic-managed deployment from a manual snapshot in S3.
You can provision an Elastic-managed deployment into AWS, GCP, or Azure for this part of the guide.
Step 1 - Size your deployment
The deployment you created in Elasticsearch Service on Elastic Cloud should have the same amount of resources as your AWS ES cluster. Use the sliders and increment the number of data nodes to reflect the size of the cluster you have in AWS ES. Save your changes before proceeding.
Step 2 - Add a custom repository
In your Elastic-managed deployment (not your AWS ES cluster), open Kibana and go to “Dev Tools”.
Copy and paste the following API call into Dev Tools, replacing the five variables:
PUT /_snapshot/SNAPSHOT_REPO { "type": "s3", "settings": { "bucket": "S3_BUCKET_NAME", "region": "S3_REGION_NAME", "access_key": "ACCESS_KEY", "secret_key": "SECRET_KEY", "compress": true } }
Execute the request.
You should get the following response:
{ "acknowledged": "true" }
You’re almost done.
Step 3 - Restore from S3
Finally, it’s time to restore from the snapshot repository we just registered.
Copy and paste the following API call into Dev Tools, replacing the two variables:
POST /_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME/_restore
You should get the following response:
{ "accepted": "true" }
You can check the progress of your restore with the following:
GET /_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME/_status
If you see "state":"SUCCESS"
your restore completed successfully:
{ "snapshots": [ { "snapshot": "my-snapshot", "repository": "my-snapshot-repo", "state": "SUCCESS", ... } ] }
Congratulations for making it through a “lift-and-shift” migration from AWS ES to the Elasticsearch Service.
Wrapping up
Now that you’re on the Elasticsearch Service on Elastic Cloud, not only can you take advantage of features that aren’t available on AWS ES, but you’ll also be able to rest easy knowing that your deployment is being maintained by the experts that created the Elastic Stack. And if you should run into any issues along the way, the experts on the Elastic support team are there to help. If you’re not on Elasticsearch Service on Elastic Cloud yet, take it for a spin with a 14-day, free trial, and if you have any questions, feel free to contact us.
Editor’s Note (September 3, 2019): This blog has been updated to include Microsoft Azure as a cloud provider for Elasticsearch Service on Elastic Cloud. For more information, please read the announcement blog.