WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Optimistic Concurrency Control
editOptimistic Concurrency Control
editElasticsearch is distributed. When documents are created, updated, or deleted, the new version of the document has to be replicated to other nodes in the cluster. Elasticsearch is also asynchronous and concurrent, meaning that these replication requests are sent in parallel, and may arrive at their destination out of sequence. Elasticsearch needs a way of ensuring that an older version of a document never overwrites a newer version.
When we discussed index
, get
, and delete
requests previously, we pointed out
that every document has a _version
number that is incremented whenever a
document is changed. Elasticsearch uses this _version
number to ensure that
changes are applied in the correct order. If an older version of a document
arrives after a new version, it can simply be ignored.
We can take advantage of the _version
number to ensure that conflicting
changes made by our application do not result in data loss. We do this by
specifying the version
number of the document that we wish to change. If that
version is no longer current, our request fails.
Let’s create a new blog post:
PUT /website/blog/1/_create { "title": "My first blog entry", "text": "Just trying this out..." }
The response body tells us that this newly created document has _version
number 1
. Now imagine that we want to edit the document: we load its data
into a web form, make our changes, and then save the new version.
First we retrieve the document:
GET /website/blog/1
The response body includes the same _version
number of 1
:
{ "_index" : "website", "_type" : "blog", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "title": "My first blog entry", "text": "Just trying this out..." } }
Now, when we try to save our changes by reindexing the document, we specify
the version
to which our changes should be applied:
PUT /website/blog/1?version=1 { "title": "My first blog entry", "text": "Starting to get the hang of this..." }
We want this update to succeed only if the current |
This request succeeds, and the response body tells us that the _version
has been incremented to 2
:
{ "_index": "website", "_type": "blog", "_id": "1", "_version": 2 "created": false }
However, if we were to rerun the same index request, still specifying
version=1
, Elasticsearch would respond with a 409 Conflict
HTTP response
code, and a body like the following:
{ "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[blog][1]: version conflict, current [2], provided [1]", "index": "website", "shard": "3" } ], "type": "version_conflict_engine_exception", "reason": "[blog][1]: version conflict, current [2], provided [1]", "index": "website", "shard": "3" }, "status": 409 }
This tells us that the current _version
number of the document in
Elasticsearch is 2
, but that we specified that we were updating version 1
.
What we do now depends on our application requirements. We could tell the
user that somebody else has already made changes to the document, and to review the changes before trying to save them again.
Alternatively, as in the case of the widget stock_count
previously, we could
retrieve the latest document and try to reapply the change.
All APIs that update or delete a document accept a version
parameter, which
allows you to apply optimistic concurrency control to just the parts of your
code where it makes sense.
Using Versions from an External System
editA common setup is to use some other database as the primary data store and Elasticsearch to make the data searchable, which means that all changes to the primary database need to be copied across to Elasticsearch as they happen. If multiple processes are responsible for this data synchronization, you may run into concurrency problems similar to those described previously.
If your main database already has version numbers—or a value such as
timestamp
that can be used as a version number—then you can reuse these
same version numbers in Elasticsearch by adding version_type=external
to the
query string. Version numbers must be integers greater than zero and less than
about 9.2e+18
--a positive long
value in Java.
The way external version numbers are handled is a bit different from the
internal version numbers we discussed previously. Instead of checking that the
current _version
is the same as the one specified in the request,
Elasticsearch checks that the current _version
is less than the specified
version. If the request succeeds, the external version number is stored as the
document’s new _version
.
External version numbers can be specified not only on index and delete requests, but also when creating new documents.
For instance, to create a new blog post with an external version number
of 5
, we can do the following:
PUT /website/blog/2?version=5&version_type=external { "title": "My first external blog entry", "text": "Starting to get the hang of this..." }
In the response, we can see that the current _version
number is 5
:
{ "_index": "website", "_type": "blog", "_id": "2", "_version": 5, "created": true }
Now we update this document, specifying a new version
number of 10
:
PUT /website/blog/2?version=10&version_type=external { "title": "My first external blog entry", "text": "This is a piece of cake..." }
The request succeeds and sets the current _version
to 10
:
{ "_index": "website", "_type": "blog", "_id": "2", "_version": 10, "created": false }
If you were to rerun this request, it would fail with the same conflict error we saw before, because the specified external version number is not higher than the current version in Elasticsearch.