Autosharding of data streams in Elasticsearch Serverless

In Elastic Cloud Serverless we spare our users from the need to fiddle with sharding by automatically configuring the optimal number of shards for data streams based on the indexing load.

Traditionally, users change the sharding configuration of data streams in order to deal with various workloads and make the best use of the available resources. In Elastic Cloud Serverless we've introduced autosharding of data streams, enabling them to be managed and scaled automatically based on indexing load. This post explores the mechanics of autosharding, its benefits, and its implications for users dealing with variable workloads. The autosharding philosophy is to increase the number of shards aggressively and reduce them very conservatively, such that an increase in shards is not followed prematurely by a reduction of shards due to a small period of reduced workload.

Autosharding of data streams in Serverless Elasticsearch

Imagine you have a large pizza that needs to be shared among your friends at a party. If you cut the pizza into only two slices for a group of six friends, each slice will need to serve multiple people. This will create a bottleneck, where one person hogs a whole slice while others wait, leading to a slow sharing process. Additionally, not everyone can enjoy the pizza at the same time; you can practically hear the sighs from the friends left waiting. If more friends show up unexpectedly, you’ll struggle to feed them with just two slices and find yourself scrambling to reshape those slices on the spot.

On the other hand, if you cut the pizza into 36 tiny slices for those same six friends, managing the sharing becomes tricky. Instead of enjoying the pizza, everyone spends more time figuring out how to grab their tiny portions. If the slices are too small, the pizza might even fall apart.

To ensure everyone enjoys the pizza efficiently, you’d aim to cut it into a number of slices that matches the number of friends. If you have six friends, cutting the pizza into 6 or 12 slices allows everyone to grab a slice without long waits. By finding the right balance in slicing your pizza, you’ll keep the party running smoothly and everyone happy.

You know it’s a good analogy when you immediately follow-up with the explanation; the pizza represents the data, the slices represent the index shards, and the friends are the Elasticsearch nodes in your cluster.

Traditionally, users of Elasticsearch had to anticipate their indexing throughput and manually configure the number of shards for each data stream. This approach relied heavily on predictive heuristics and required ongoing adjustments based on workload characteristics whilst also balancing data storage, search analytics, and application performance.

Businesses with seasonal traffic, like retail, often deal with spikes in data demands, while IoT applications can experience rapid load increases at specific times. Development and testing environments typically run only a few hours a week, making fixed shard configurations inefficient. New applications might struggle to estimate workload needs accurately, leading to potential over- or under-provisioning.

We've introduced autosharding of data streams in Elastic Cloud Serverless. Data streams in Serverless are managed and scaled automatically based on indexing load - automatically slicing your pizza as friends arrive to your party or finish eating.

The promise of autosharding

Autosharding addresses these challenges by automatically adjusting the number of shards in response to the current indexing load. This means that instead of users having to manually tweak configurations, Elasticsearch will dynamically manage shard counts for the data streams in your project based on real-time data traffic.

Elasticsearch keeps track of the indexing load for every index as part of a metric named write load, and exposes it for on-prem and ESS deployments as part of the index stats API under the indexing section.

The write_load represents the average number of write threads used while indexing documents.

For an index with one shard the maximum possible value of the write_load metric is the number of write threads available (e.g. all write threads are busy writing in the same shard).

For indices with multiple shards the maximum possible value for the write load is the number of write threads available in a node times the number of indexing nodes in the project. (e.g. all write threads on all the indexing nodes that host a shard for our index are busy writing in the shards belonging to our index, exclusively)

To get a sense of the values allowed for write_load let’s look at index logs with one shard running on one Elasticsearch machine with 2 allocated processors. The write thread pool will be sized to 2 threads. This means that if this Elasticsearch node is exclusively and constantly writing to the same index logs, the write_load we’ll report for index logs will be 2.0 (i.e. 2 write threads fully utilized for writing into index logs).

If logs has 2 primary shards and we’re now running on two Elasticsearch nodes, each with 2 allocated processors we’ll be able to get a maximum reported write_load of 4.0 if all write threads on both Elasticsearch nodes are exclusively writing into the logs index.

Serverless autoscaling

We just looked at how the write load capacity doubled when we increased the number of shards and Elasticsearch nodes. Elastic Cloud Serverless takes care automatically of both these operations using data stream autosharding and ingest autoscaling. Autoscaling refers to the process of dynamically adjusting resources - like memory, CPU, and disk - based on current demands.

In our serverless architecture, we start with a small 2GB memory server and use a step-function scaling approach to increase capacity efficiently. We scale up memory incrementally and then scale out by adding servers. This cycle continues, increasing memory per server incrementally up to 64GB while managing the number of servers.

Linking autoscaling and autosharding

The connection between autoscaling and autosharding is essential for optimizing performance. When calculating the optimal number of shards for a data stream, we consider the minimum and maximum number of available write threads per node in our scaling setup.

  • For small projects, the system will move from 1 to 2 shards when the data stream uses more than half the capacity of a node (i.e., more than one indexing thread).
  • For medium-sized projects, as the system scales across multiple nodes, it will not exceed 3 shards to avoid excessive overhead.
  • Once we reach the largest node sizes, further sharding is enabled to accommodate larger workloads.

Autosharding also enables autoscaling to increase resources as needed, preventing the system from staying at low capacity during high indexing workloads, by enabling projects to reach higher ingestion load values.

Auto sharding formula

To determine the number of shards needed, we use the following formula:

This equation balances the need for increasing shards based on write_load while capping the number of shards to prevent oversharding. The division by 2 reflects the strategy of increasing shards only after exceeding half the capacity of a node. The min/max write threads represent the minimum and maximum number of write threads available in the autoscaling step function (i.e. the number of write threads available on the smallest 2GB step and the number of write threads available on the largest server)

Let’s visualize the output of the formula:

On the Y axis we have the number of shards. And on the X axis we have the write load. We start with 1 shard and we get to 3 shards when the write load is just over 3.0. We remain with 3 shards for quite some time until the write load is about 48.0.

This covers us for the time we scale up through the nodes but haven’t really got to 2 or more or the largest servers, at which point we unlock auto sharding to more than 3 shards, as many as needed to ingest data.

While adding shards can improve indexing performance, excessive sharding in an Elasticsearch cluster can have negative repercussions - imagine that pizza with 56 slices being shared by only 7 friends. Each shard carries overhead costs, including maintenance and resource allocation. Our algorithm accounts for and avoids the peril of excessive sharding until we get to the largest workloads where adding more than 3 shards makes a material difference to indexing performance and throughput.

Implementing autosharding with rollovers

The implementation of autosharding relies on the concept of rollover. A rollover operation creates a new index within the data stream, promoting it to the write index while designating the previous index as a regular backing index, which no longer accepts writes. This transition can occur based on specific conditions, such as exceeding a shard size of 50GB. We take care of configuring the optimal rollover conditions for data streams in Serverless.

In Serverless alongside the usual rollover conditions that relate to maintaining healthy indices and shards we introduce a new condition that evaluates whether the current write load necessitates an increase in shard count. If this condition is met, a rollover will be triggered and the new resulting data stream write index will be configured with the optimal number of shards.

For downscaling, the system will monitor the workload and will not trigger a rollover solely for reducing shards. Instead, it will wait until a regular rollover condition, like the primary shard size, triggers the rollover. The resulting write index will be configured with a lower number of shards.

Cooldown periods for shard adjustments

To ensure stability during shard adjustments, we implement cooldown periods:

  • Increase shards cooldown: A minimum wait time of 4.5 minutes is enforced before increasing the number of shards since the last adjustment. The 4.5 minutes cooldown might seem peculiar but the interval has been chosen to make sure we can increase the number of shards every time data stream lifecycle checks if data streams should rollover (currently, every 5 minutes) but not more often than 5 minutes, covering for internal Elasticsearch cluster reconfiguration.
  • Decrease shards cooldown: We maintain a 3-day minimum wait time before reducing shards to ensure that the decision is based on sustained workload patterns rather than temporary fluctuations.

Conclusion

The data streams autosharding feature in Serverless Elasticsearch represents significant progress in managing data streams effectively. By automatically adjusting shard counts based on real-time indexing loads, this feature simplifies operations and enhances scalability.

With the added benefits of autoscaling, users can expect a more efficient and responsive experience, whether they are handling small projects or large-scale applications. As data workloads continue to evolve, the adaptability provided by auto sharding ensures that Elasticsearch remains a robust solution for managing diverse indexing needs.

Try out our Serverless Elasticsearch offering to take advantage of data streams auto sharding and observe the indexing throughput scaling seamlessly as your data ingestion load increases.

Your pizzas will be optimally sliced as more friends arrive at your party, keen to try those sourdough craft pizzas you prepared for them.

Learn more about Elastic Cloud Serverless, and start a 14-day free trial to test it out yourself.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself