Migrating from AWS Elasticsearch to Elasticsearch Service on Elastic Cloud

Access all of the features of the Elastic Stack

As a Solutions Architect, I often get asked how to move an Elastic deployment from the Amazon Elasticsearch Service (AWS ES) to the Elasticsearch Service.  I’m primarily asked because users want to take advantage of all of the features, operational expertise and support provided by Elastic that is not available from Amazon. This practitioner's guide will walk you through the proverbial “lift-and-shift” to the Elasticsearch Service from Elastic.

To get started with the Elasticsearch Service, there is a free 14-day trial that allows you to create a deployment. You can pick the cloud provider, AWS, GCP, or Azure and the region where you’d like Elastic to run your deployment.  AWS users can add the Elasticsearch service directly from the AWS marketplace, integrated into your AWS bill.

There are quite a few capabilities that go beyond what’s available in the open source distribution, like Canvas, APM, unsupervised machine learning, frozen indices, SQL, security (beyond basic IAM policies and perimeter-only) and deployment templates that are unique to the Elasticsearch Service on Elastic Cloud. We’re adding more unique capabilities all the time. To learn more about what we do relative to AWS ES, check out our AWS Elasticsearch comparison page from time to time.

Migrating from AWS Elasticsearch to the Elasticsearch Service on Elastic Cloud

This is a fairly technical guide for migrating from AWS ES to the Elasticsearch Service on Elastic Cloud and requires some programming experience. AWS ES clusters are commonly provisioned into a VPC, but they can also be put on a public-facing endpoint. In order to keep this guide universal for both scenarios, we use the Python AWS SDK. You can use any language that has an AWS SDK (e.g., Java, Ruby, Go, etc.) but we only provide examples in Python below.

There are two parts to this guide:

Note: If you already have your AWS ES cluster manually snapshotted to S3, you can skip to part two.

Before we begin, it’s important to understand some of the IAM security steps that follow. First, in order to snapshot an AWS ES cluster into S3, your AWS ES cluster needs permission to write to a private S3 bucket. This requires an IAM role and policy that have the necessary permissions. Also, we’ll need to attach an IAM policy to an IAM user (creating one if necessary). The IAM user is used by our script to talk to your AWS ES cluster and by your Elastic-managed deployment to read the snapshot from your S3 bucket.

Part one – Snapshot to S3

The first part of this guide involves setting up an IAM role, policy, and user to snapshot your AWS ES cluster to S3. The AWS documentation for this process can be found here: Working with Amazon Elasticsearch Service Index Snapshots. It may help as a reference if you get stuck.

You’ll need to note several variables we’ll be using along the way. Copy and paste the following table to a notes file, where you can reference it throughout this guide. This will make it easy to fill in the values specific for your migration.

DescriptionVariableValue
AWS ES Domain ARN DOMAIN_ARN
AWS ES Endpoint URL ES_ENDPOINT
AWS ES Region ES_REGION
AWS S3 Bucket Name S3_BUCKET_NAME
AWS S3 Region S3_REGION_NAME
AWS IAM Role ARN ROLE_ARN
AWS IAM Access Key ID ACCESS_KEY
AWS IAM Secret Access Key SECRET_KEY
AWS ES Snapshot Repository SNAPSHOT_REPO my-snapshot-repo
AWS ES Snapshot Name SNAPSHOT_NAME my-snapshot

You can change the values of SNAPSHOT_REPO and SNAPSHOT_NAME or use the examples provided (i.e., “my-snapshot-repo” and “my-snapshot”).

Step 1 - Get your AWS ES info

We’ll need some basic information about your AWS ES cluster to snapshot it to S3.

  1. In your AWS Console, go to the Elasticsearch Service
  2. Select the domain of the cluster you want to snapshot
  3. Copy the “Domain ARN” value to your notes file (DOMAIN_ARN)
  4. Copy the “Endpoint” URL value to your notes file (ES_ENDPOINT)
  5. Note which AWS region (e.g., us-east-1) your AWS ES cluster is in (ES_REGION)

This information is used below in the IAM policy creation and when it’s time to issue commands to the cluster.

Step 2 - Create an AWS S3 bucket

We need to create an S3 bucket to store your snapshot.

Important: Your S3 bucket must be in the same region as your AWS ES cluster. You will be able to restore from there to an Elastic-managed deployment in any region or cloud provider (AWS, GCP, or Azure).

  1. In your AWS Console, go to the S3 service
  2. Create a private S3 bucket
    Note: If you leave the defaults, your bucket will be private and secure
  3. Copy the name of the bucket to your notes file (S3_BUCKET_NAME)
  4. Copy the region of the bucket to your notes file (S3_REGION_NAME)

This information is used when we register a snapshot repository with Elasticsearch.

Step 3 - Create an IAM role

Next we’ll create a role to delegate permission to Amazon Elasticsearch Service to take a snapshot into S3.

  1. In your AWS Console, go to the IAM service
  2. Select “Roles”
  3. Select “Create role”
  4. Select “EC2” as the service that will use this new role (we will change it later)
  5. Select “Next: Permissions”
  6. Leave the policies on the role empty for now
  7. Select “Next: Tags”
  8. Select “Next: Review”
  9. Name the role: TheSnapshotRole
  10. Select “Create role”
  11. From the list of roles, select the one we just created: TheSnapshotRole
  12. Select “Trust relationships”
  13. Select “Edit trust relationship”
  14. Copy and paste the following into the trust relationship (replacing what’s there)

    {
     "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": {
          "Service": "es.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }]
    }
        
  15. Select “Update Trust Policy”
  16. Select “Permissions”
  17. Select “Add inline policy”
  18. Select the JSON tab
  19. Copy and paste the following JSON (replacing what’s there)
  20. Replace S3_BUCKET_NAME with the correct value (in two places)

    {
      "Version": "2012-10-17",
      "Statement": [{
          "Action": [
            "s3:ListBucket"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::S3_BUCKET_NAME"
          ]
        },
        {
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "s3:DeleteObject"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::S3_BUCKET_NAME/*"
          ]
        }
      ]
    }
        
  21. Select “Review policy”
  22. Name the policy: TheSnapshotS3Policy
  23. Select “Create policy”
  24. Copy the “Role ARN” value to your notes file (ROLE_ARN)

We just created an IAM role with an inline policy that can read & write to your S3 bucket.

Step 4 - Create an IAM policy

We need to create a new IAM policy that has permission to assume the role above in order to register the snapshot repository.

  1. In your AWS Console, go to the IAM service
  2. Select “Policies”
  3. Select “Create policy”
  4. Select the JSON tab
  5. Copy and paste the following JSON (replacing what’s there)
  6. Replace ROLE_ARN with the correct value
  7. Replace DOMAIN_ARN with the correct value
  8. Replace S3_BUCKET_NAME with the correct value (in 2 places)

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "iam:PassRole",
          "Resource": "ROLE_ARN"
        },
        {
          "Effect": "Allow",
          "Action": "es:ESHttpPut",
          "Resource": "DOMAIN_ARN/*"
        }
      ]
    }
        
  9. Select “Review policy”
  10. Name the policy: TheSnapshotPolicy
  11. Select “Create policy”

We just created an IAM policy that allows the IAM role to talk to your AWS ES domain.

Step 5 - Create an IAM user

If you don’t already have an IAM user, we’ll need to create one and give it access to your private S3 bucket. If you do have an IAM user, you can simply attach the following IAM policy to it.

  1. In your AWS Console, go to the IAM service
  2. Select “Users”
  3. Select “Add user”
  4. Name the user: TheSnapshotUser
  5. Check the box “Programmatic access”
  6. Select “Next: Permissions”
  7. Select the box “Attach existing policies directly”
  8. Filter the policies by typing in “TheSnapshot”
  9. Select the checkbox next to the policy “TheSnapshotPolicy”
  10. Select “Next: Tags”
  11. Select “Next: Review”
  12. Select “Create user”
  13. Copy the “Access key ID” value to your notes file (ACCESS_KEY)
  14. Select “Show” under “Secret access key”
  15. Copy the “Secret access key” value to your notes file (SECRET_KEY)
  16. Select “Close”
  17. From the list of users, select the one we just created: TheSnapshotUser
  18. Select “Add inline policy”
  19. Select the JSON tab
  20. Copy and paste the following JSON (replacing what’s there)
  21. Replace S3_BUCKET_NAME with the correct value (in 2 places)

    {
      "Version": "2012-10-17",
      "Statement": [{
          "Action": [
            "s3:ListBucket"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::S3_BUCKET_NAME"
          ]
        },
        {
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "s3:DeleteObject"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::S3_BUCKET_NAME/*"
          ]
        }
      ]
    }
        
  22. Select “Review policy”
  23. Name the policy: TheSnapshotUserS3Policy
  24. Select “Create policy”

We just created an IAM user that can take a manual snapshot and read from that snapshot.

Step 6 - Configure the Python AWS SDK

Before we run a manual snapshot, we need to register a snapshot repository with your deployment. This requires sending a signed request to your AWS ES cluster. One of the easiest ways to do this is with the Python AWS SDK. You can use another AWS SDK (e.g., Java, Ruby, Go, etc.) but the example below uses the Python AWS SDK.

We’ll install the Python AWS SDK using Python’s package installer PIP (pip3). This requires Python v3 to be installed. If you don’t have Python v3 installed, you can get it by just installing pip3. Your operating system’s package manager will install Python v3 automatically, since it’s a dependency to pip3. If you get stuck, refer to the Python installation docs.

Installing pip3

To install pip3 on Red Hat and derivatives, use yum:

$ sudo yum -y install python3-pip

Alternatively, some Fedora distributions label the pip3 package differently:

$ sudo yum -y install python36-pip

You can search for it if neither package name works above:

$ yum search pip

On Debian derivatives such as Ubuntu, use apt-get:

$ sudo apt-get -y install python3-pip

Installing the Python AWS SDK

Once pip3 is installed, you can install the Python AWS SDK called boto3:

$ pip3 install --user boto3 requests_aws4auth
Collecting boto3
...
Successfully installed boto3-1.9.106 requests-aws4auth-0.9 ...

Note: No root access is needed if you specify the --user flag.

We need to create an ~/.aws directory to hold our AWS credentials. Run the following command to create the directory:

$ mkdir ~/.aws

Create a file called credentials with your favorite editor. We’ll use nano for simplicity:

$ nano ~/.aws/credentials

Copy and paste the following contents into the file, replacing the 2 uppercase variables.

[default]
aws_access_key_id = ACCESS_KEY
aws_secret_access_key = SECRET_KEY

Use ctrl+x to exit nano, and follow the prompts to save the file.

Next, we’ll write a few Python scripts to perform the tasks we need.

Step 7 - Manually snapshot AWS ES

Let’s run a quick test using a Python script to list the indices in our AWS ES cluster. This will ensure our AWS credentials are working and prove we can talk to the cluster.

Create a file called indices.py with your favorite editor. We’ll use nano for simplicity:

$ nano indices.py

Copy and paste the following contents, replacing the two uppercase variables with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Listing Indices from AWS ES ...")
req = requests.get(host + '/_cat/indices?v', auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Use ctrl+x to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 indices.py

Your output should look similar to the following:

Listing Indices from AWS ES ...
HTTP Response Code: 200
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   testindex yME2BphgR3Gt1ln6n03nHQ   5   1          1            0      4.4kb          4.4kb

Now create a file called register.py with your favorite editor.

$ nano register.py

Copy and paste the following contents, replacing the seven uppercase variables with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
s3_region_name = 'S3_REGION_NAME'
s3_bucket_name = 'S3_BUCKET_NAME'
role_arn = 'ROLE_ARN'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
headers = {"Content-Type": "application/json"}
payload = {
        "type": "s3",
        "settings": {
                "region": s3_region_name,
                "bucket": s3_bucket_name,
                "role_arn": role_arn
        }
}
print("Registering Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name
req = requests.put(url, auth=auth, json=payload, headers=headers)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Use ctrl+x to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 register.py

Your output should look similar to the following:

Registering Snapshot with AWS ES ...
HTTP Response Code: 200
{"acknowledged":true}

Next, create a file called snapshot.py with your favorite editor.

$ nano snapshot.py

Copy and paste the following contents, replacing the four upper variables with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Starting Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name + '/' + snapshot_name
req = requests.put(url, auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Use ctrl+x to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 snapshot.py

Your output should look similar to the following:

Starting Snapshot with AWS ES ...
HTTP Response Code: 200
{"accepted":true}

Note: The time required to take a snapshot increases with the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a “504 GATEWAY_TIMEOUT”. Their docs say that you can ignore this error and just wait for the snapshot to complete successfully.

Finally, let’s check the status of our snapshot. Create a file called status.py.

$ nano status.py

Copy and paste the following contents, replacing the four upper variables with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Getting Status of Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name + '/' + snapshot_name + '?pretty'
req = requests.get(url, auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Use ctrl+x to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 status.py

Your output should look similar to the following:

Getting Status of Snapshot with AWS ES ...
HTTP Response Code: 200
{
  "snapshots" : [ {
    "snapshot" : "my-snapshot",
    "uuid" : "ClYKt5g8QFO6r3kTCEzjqw",
    "version_id" : 6040299,
    "version" : "6.4.2",
    "indices" : [ "testindex" ],
    "include_global_state" : true,
    "state" : "SUCCESS",
    "start_time" : "2019-03-03T14:46:04.094Z",
    "start_time_in_millis" : 1551624364094,
    "end_time" : "2019-03-03T14:46:04.847Z",
    "end_time_in_millis" : 1551624364847,
    "duration_in_millis" : 753,
    "failures" : [ ],
    "shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
    }
  } ]
}

If you see "state":"SUCCESS" then you have successfully taken a snapshot to S3 and are ready for part two!

Part two – Restore from S3

The second part of this guide involves restoring an Elastic-managed deployment from a manual snapshot in S3.

You can provision an Elastic-managed deployment into AWS, GCP, or Azure for this part of the guide.

Step 1 - Size your deployment

The deployment you created in Elasticsearch Service on Elastic Cloud should have the same amount of resources as your AWS ES cluster. Use the sliders and increment the number of data nodes to reflect the size of the cluster you have in AWS ES. Save your changes before proceeding.

Step 2 - Add a custom repository

In your Elastic-managed deployment (not your AWS ES cluster), open Kibana and go to “Dev Tools”.

Copy and paste the following API call into Dev Tools, replacing the five variables:

PUT /_snapshot/SNAPSHOT_REPO
{
  "type": "s3",
  "settings": {
    "bucket": "S3_BUCKET_NAME",
    "region": "S3_REGION_NAME",
    "access_key": "ACCESS_KEY",
    "secret_key": "SECRET_KEY",
    "compress": true
  }
}

Execute the request.

You should get the following response:

{
  "acknowledged": "true"
}

You’re almost done.

Step 3 - Restore from S3

Finally, it’s time to restore from the snapshot repository we just registered.

Copy and paste the following API call into Dev Tools, replacing the two variables:

POST /_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME/_restore

You should get the following response:

{
  "accepted": "true"
}

You can check the progress of your restore with the following:

GET /_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME/_status

If you see "state":"SUCCESS" your restore completed successfully:

{
  "snapshots": [
    {
      "snapshot": "my-snapshot",
      "repository": "my-snapshot-repo",
      "state": "SUCCESS",
      ...
    }
  ]
}

Congratulations for making it through a “lift-and-shift” migration from AWS ES to the Elasticsearch Service.

Wrapping up

Now that you’re on the Elasticsearch Service on Elastic Cloud, not only can you take advantage of features that aren’t available on AWS ES, but you’ll also be able to rest easy knowing that your deployment is being maintained by the experts that created the Elastic Stack. And if you should run into any issues along the way, the experts on the Elastic support team are there to help.  If you’re not on Elasticsearch Service on Elastic Cloud yet, take it for a spin with a 14-day, free trial, and if you have any questions, feel free to contact us.

Editor’s Note (September 3, 2019): This blog has been updated to include Microsoft Azure as a cloud provider for Elasticsearch Service on Elastic Cloud. For more information, please read the announcement blog.