Elastic offers many instructor-led, in-person and virtual live trainings, as well as on-demand trainings. Our flagship courses are Elasticsearch Engineer, Data Analysis with Kibana, and Elastic Observability Engineer. All of these courses lead to certifications.
We recently released the latest version of Elasticsearch Engineer training in response to increased demand and new features. This course is designed for both new Elasticsearch users and Elasticsearch professionals. It begins with the basics for getting started with the Elastic Stack, then quickly dives deep into topics ranging from optimizing search performance to building efficient clusters. View the detailed course outline to find out more about what you’ll learn. All lessons include hands-on labs.
During this instructor-led “Elasticsearch Engineer” training, one of the most common questions we get while teaching about snapshots is “how each snapshot is logically independent?” In this blog post, I will explain this in detail.
What is a snapshot?
A snapshot is a backup of a running Elasticsearch cluster. You can use snapshots to:
- Regularly back up a cluster with no downtime
- Recover data after deletion or a hardware failure
- Transfer data between clusters
- Reduce your storage costs by using searchable snapshots in the hot, cold and frozen data tiers
Deduplication of snapshots
To back up an index, a snapshot makes a copy of the index’s segments and stores them in the snapshot repository.
Indices are made up of shards. Each Elasticsearch shard is a Lucene index. Each Lucene index is divided into smaller units called segments. When you add new documents to your index, Lucene creates a new segment and writes to it. From time to time, Lucene merges smaller segments into a larger one.
Since segments are immutable, the snapshot only needs to copy any new segments created since the repository’s last snapshot.
Each snapshot is also logically independent. When you delete a snapshot, Elasticsearch only deletes the segments used exclusively by that snapshot. Elasticsearch doesn’t delete segments that are still used by other snapshots in the repository.
Let’s go through this example to get a better understanding.
- Suppose we take a snapshot (snap1) of a simple index with one shard and two segments.
- Some time later as new documents are indexed, a new segment C gets creates in shard0.
- A second snapshot (snap2) will only copy the missing segment(s) to the repository.
- Some time later, segments A, B, and C are merged, creating a new segment D.
- When creating a new snapshot (snap3), the new segment D is copied to the repository.
- Deleting a snapshot (snap1) only deletes segments in the repository that are no longer referenced by any other snapshot.
- In this case, no segments are deleted from the repository.
- Only after deleting snap2, segments A, B, and C will also be deleted from the repository.
Summary
In this blog post, I explained how snapshots are automatically deduplicated with the help of some graphics. For more information, please feel free to read through the official documentation.
The Elastic Stack is versatile enough to tackle any use case. Want to learn how to harness the power of that versatility? Become an Elastic expert through free, paid, private, and training subscriptions. Our instructor-led virtual classes are offered globally, in time zones that make learning convenient for you. Enhance your professional visibility and push aside technical boundaries within your company by becoming Elastic certified.
Reach out to us at training@elastic.co with any questions.
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Start a free trial now.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!