Migrate your Elasticsearch data
editMigrate your Elasticsearch data
editYou might have switched to Elasticsearch Add-On for Heroku for any number of reasons and you’re likely wondering how to get your existing Elasticsearch data into your new infrastructure. Along with easily creating as many new deployments with Elasticsearch clusters that you need, you have several options for moving your data over. Choose the option that works best for you:
- Index your data from the original source, which is the simplest method and provides the greatest flexibility for the Elasticsearch version and ingestion method.
- Reindex from a remote cluster, which rebuilds the index from scratch.
- Restore from a snapshot, which copies the existing indices.
One of the many advantages of Elasticsearch Add-On for Heroku is that you can spin up a deployment quickly, try out something, and then delete it if you don’t like it. This flexibility provides the freedom to experiment while your existing production cluster continues to work.
Before you begin
editDepending on which option that you choose, you might have limitations or need to do some preparation beforehand.
- Indexing from the source
- The new cluster must be the same size as your old one, or larger, to accommodate the data.
- Reindex from a remote cluster
- The new cluster must be the same size as your old one, or larger, to accommodate the data. Depending on your security settings for your old cluster, you might need to temporarily allow TCP traffic on port 9243 for this procedure.
- Restore from a snapshot
- The new cluster must be the same size as your old one, or larger, to accommodate the data. The new cluster must also be an Elasticsearch version that is compatible with the old cluster (check Elasticsearch snapshot version compatibility for details). If you have not already done so, you will need to set up snapshots for your old cluster using a repository that can be accessed from the new cluster.
- Migrating internal Elasticsearch indices
-
If you are migrating internal Elasticsearch indices from another cluster, specifically the
.kibana
index or the.security
index, there are two options:- Use the steps on this page to reindex the internal indices from a remote cluster. The steps for reindexing internal indices and regular, data indices are the same.
- Check Migrating internal indices to restore the internal Elasticsearch indices from a snapshot.
Before you migrate your Elasticsearch data, define your index mappings on the new cluster. Index mappings are unable to migrate during reindex operations.
Index from the source
editIf you still have access to the original data source, outside of your old Elasticsearch cluster, you can load the data from there. This might be the simplest option, allowing you to choose the Elasticsearch version and take advantage of the latest features. You have the option to use any ingestion method that you want—Logstash, Beats, the Elasticsearch clients, or whatever works best for you.
If the original source isn’t available or has other issues that make it non-viable, there are still two more migration options, getting the data from a remote cluster or restoring from a snapshot.
Reindex from a remote cluster
editThrough the Elasticsearch reindex API, you can connect your new Elasticsearch Add-On for Heroku deployment remotely to your old Elasticsearch cluster. This pulls the data from your old cluster and indexes it into your new one. Reindexing essentially rebuilds the index from scratch and it can be more resource intensive to run.
- Log in to the Elasticsearch Add-On for Heroku console.
- Select a deployment or create one.
-
If the old Elasticsearch cluster is on a remote host (any type of host accessible over the internet), you need to make sure that the host can be accessed. Access is determined by the Elasticsearch
reindex.remote.whitelist
user setting.Domains matching the pattern
["*.io:*", "*.com:*"]
are allowed by default, so if your remote host URL matches that pattern you do not need to explicitly definereindex.remote.whitelist
.Otherwise, if your remote endpoint is not covered by the default settings, adjust the setting to add the remote Elasticsearch cluster as an allowed host:
- From your deployment menu, go to the Edit page.
- In the Elasticsearch section, select Manage user settings and extensions. For deployments with existing user settings, you may have to expand the Edit elasticsearch.yml caret for each node type instead.
-
Add the following
reindex.remote.whitelist: [REMOTE_HOST:PORT]
user setting, whereREMOTE_HOST
is a pattern matching the URL for the remote Elasticsearch host that you are reindexing from, and PORT is the host port number. Do not include thehttps://
prefix.Note that if you override the parameter it replaces the defaults:
["*.io:*", "*.com:*"]
. If you still want these patterns to be allowed you need to specify them explicitly in the value.For example:
reindex.remote.whitelist: ["*.us-east-1.aws.found.io:9243", "*.com:*"]
- Save your changes.
- From the API Console or in the Kibana Console app, create the destination index on Elasticsearch Add-On for Heroku.
-
Copy the index from the remote cluster:
POST _reindex { "source": { "remote": { "host": "https://REMOTE_ELASTICSEARCH_ENDPOINT:PORT", "username": "USER", "password": "PASSWORD" }, "index": "INDEX_NAME", "query": { "match_all": {} } }, "dest": { "index": "INDEX_NAME" } }
-
Verify that the new index is present:
GET INDEX-NAME/_search?pretty
- You can remove the reindex.remote.whitelist user setting that you added previously.
Restore from a snapshot
editIf you cannot connect to a remote index for whatever reason, such as if it’s in a non-working state, you can try restoring from the most recent working snapshot.
-
On your old Elasticsearch cluster, choose an option to get the name of your snapshot repository bucket:
GET /_snapshot GET /_snapshot/_all
-
Get the snapshot name:
GET /_snapshot/NEW-REPOSITORY-NAME/_all
The output for each entry provides a
"snapshot":
value which is the snapshot name.{ "snapshots": [ { "snapshot": "scheduled-1527616008-instance-0000000004", ... }, ... ] }
-
From the Elasticsearch Add-On for Heroku console of the new Elasticsearch cluster, add the snapshot repository. For details, check our guidelines for Amazon Web Services (AWS) Storage, Google Cloud Storage (GCS), or Azure Blob Storage.
If you’re migrating searchable snapshots, the repository name must be identical in the source and destination clusters.
If source cluster is still writing to the repository, you need to set the destination cluster’s repository connection to
readonly:true
to avoid data corruption. Refer to backup a repository for details. -
Start the Restore process.
- Open Kibana and go to Management > Snapshot and Restore.
- Under the Snapshots tab, you can find the available snapshots from your newly added snapshot repository. Select any snapshot to view its details, and from there you can choose to restore it.
- Select Restore.
- Select the indices you wish to restore.
- Configure any additional index settings.
- Select Restore snapshot to begin the process.
-
Verify that the new index is restored in your Elasticsearch Add-On for Heroku deployment with this query:
GET INDEX_NAME/_search?pretty