Reindex API
editReindex API
editReindex Request
editA ReindexRequest
can be used to copy documents from one or more indexes into a destination index.
It requires an existing source index and a target index which may or may not exist pre-request. Reindex does not attempt to set up the destination index. It does not copy the settings of the source index. You should set up the destination index prior to running a _reindex action, including setting up mappings, shard counts, replicas, etc.
The simplest form of a ReindexRequest
looks like follows:
ReindexRequest request = new ReindexRequest(); request.setSourceIndices("source1", "source2"); request.setDestIndex("dest");
The dest
element can be configured like the index API to control optimistic concurrency control. Just leaving out
versionType
(as above) or setting it to internal will cause Elasticsearch to blindly dump documents into the target.
Setting versionType
to external will cause Elasticsearch to preserve the version from the source, create any documents
that are missing, and update any documents that have an older version in the destination index than they do in the
source index.
Setting opType
to create
will cause _reindex
to only create missing documents in the target index. All existing
documents will cause a version conflict. The default opType
is index
.
By default version conflicts abort the _reindex
process but you can just count them by settings it to proceed
in the request body
You can limit the documents by adding a type to the source or by adding a query.
It’s also possible to limit the number of processed documents by setting size.
By default _reindex
uses batches of 1000. You can change the batch size with sourceBatchSize
.
Reindex can also use the ingest feature by specifying a pipeline
.
If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more selective query to size and sort.
ReindexRequest
also supports a script
that modifies the document. It allows you to also change the document’s
metadata. The following example illustrates that.
request.setScript( new Script( ScriptType.INLINE, "painless", "if (ctx._source.user == 'kimchy') {ctx._source.likes++;}", Collections.emptyMap()));
ReindexRequest
supports reindexing from a remote Elasticsearch cluster. When using a remote cluster the query should be
specified inside the RemoteInfo
object and not using setSourceQuery
. If both the remote info and the source query are
set it results in a validation error during the request. The reason for this is that the remote Elasticsearch may not
understand queries built by the modern query builders. The remote cluster support works all the way back to Elasticsearch
0.90 and the query language has changed since then. When reaching older versions, it is safer to write the query by hand
in JSON.
request.setRemoteInfo( new RemoteInfo( "https", "localhost", 9002, null, new BytesArray(new MatchAllQueryBuilder().toString()), "user", "pass", Collections.emptyMap(), new TimeValue(100, TimeUnit.MILLISECONDS), new TimeValue(100, TimeUnit.SECONDS) ) );
ReindexRequest
also helps in automatically parallelizing using sliced-scroll
to
slice on _uid
. Use setSlices
to specify the number of slices to use.
ReindexRequest
uses the scroll
parameter to control how long it keeps the "search context" alive.
Optional arguments
editIn addition to the options above the following arguments can optionally be also provided:
Synchronous Execution
editBulkByScrollResponse bulkResponse = client.reindex(request, RequestOptions.DEFAULT);
Asynchronous Execution
editThe asynchronous execution of a reindex request requires both the ReindexRequest
instance and an ActionListener
instance to be passed to the asynchronous
method:
The asynchronous method does not block and returns immediately. Once it is
completed the ActionListener
is called back using the onResponse
method
if the execution successfully completed or using the onFailure
method if
it failed.
A typical listener for BulkByScrollResponse
looks like:
ActionListener<BulkByScrollResponse> listener = new ActionListener<BulkByScrollResponse>() { @Override public void onResponse(BulkByScrollResponse bulkResponse) { } @Override public void onFailure(Exception e) { } };
Called when the execution is successfully completed. The response is provided as an argument and contains a list of individual results for each operation that was executed. Note that one or more operations might have failed while the others have been successfully executed. |
|
Called when the whole |
Reindex Response
editThe returned BulkByScrollResponse
contains information about the executed operations and
allows to iterate over each result as follows:
TimeValue timeTaken = bulkResponse.getTook(); boolean timedOut = bulkResponse.isTimedOut(); long totalDocs = bulkResponse.getTotal(); long updatedDocs = bulkResponse.getUpdated(); long createdDocs = bulkResponse.getCreated(); long deletedDocs = bulkResponse.getDeleted(); long batches = bulkResponse.getBatches(); long noops = bulkResponse.getNoops(); long versionConflicts = bulkResponse.getVersionConflicts(); long bulkRetries = bulkResponse.getBulkRetries(); long searchRetries = bulkResponse.getSearchRetries(); TimeValue throttledMillis = bulkResponse.getStatus().getThrottled(); TimeValue throttledUntilMillis = bulkResponse.getStatus().getThrottledUntil(); List<ScrollableHitSource.SearchFailure> searchFailures = bulkResponse.getSearchFailures(); List<BulkItemResponse.Failure> bulkFailures = bulkResponse.getBulkFailures();
Get total time taken |
|
Check if the request timed out |
|
Get total number of docs processed |
|
Number of docs that were updated |
|
Number of docs that were created |
|
Number of docs that were deleted |
|
Number of batches that were executed |
|
Number of skipped docs |
|
Number of version conflicts |
|
Number of times request had to retry bulk index operations |
|
Number of times request had to retry search operations |
|
The total time this request has throttled itself not including the current throttle time if it is currently sleeping |
|
Remaining delay of any current throttle sleep or 0 if not sleeping |
|
Failures during search phase |
|
Failures during bulk index operation |