Delete By Query API

edit

Delete By Query Request

edit

A DeleteByQueryRequest can be used to delete documents from an index. It requires an existing index (or a set of indices) on which deletion is to be performed.

The simplest form of a DeleteByQueryRequest looks like:

DeleteByQueryRequest request = new DeleteByQueryRequest("source1", "source2"); 

Creates the DeleteByQueryRequest on a set of indices.

By default version conflicts abort the DeleteByQueryRequest process but you can just count them by settings it to proceed in the request body

request.setConflicts("proceed"); 

Set proceed on version conflict

You can limit the documents by adding a query.

request.setQuery(new TermQueryBuilder("user", "kimchy")); 

Only copy documents which have field user set to kimchy

It’s also possible to limit the number of processed documents by setting size.

request.setSize(10); 

Only copy 10 documents

By default DeleteByQueryRequest uses batches of 1000. You can change the batch size with setBatchSize.

request.setBatchSize(100); 

Use batches of 100 documents

DeleteByQueryRequest also helps in automatically parallelizing using sliced-scroll to slice on _uid. Use setSlices to specify the number of slices to use.

request.setSlices(2); 

set number of slices to use

DeleteByQueryRequest uses the scroll parameter to control how long it keeps the "search context" alive.

request.setScroll(TimeValue.timeValueMinutes(10)); 

set scroll time

If you provide routing then the routing is copied to the scroll query, limiting the process to the shards that match that routing value.

request.setRouting("=cat"); 

set routing

Optional arguments

edit

In addition to the options above the following arguments can optionally be also provided:

request.setTimeout(TimeValue.timeValueMinutes(2)); 

Timeout to wait for the delete by query request to be performed as a TimeValue

request.setRefresh(true); 

Refresh index after calling delete by query

request.setIndicesOptions(IndicesOptions.LENIENT_EXPAND_OPEN); 

Set indices options

Synchronous Execution

edit
BulkByScrollResponse bulkResponse = client.deleteByQuery(request, RequestOptions.DEFAULT);

Asynchronous Execution

edit

The asynchronous execution of an delete by query request requires both the DeleteByQueryRequest instance and an ActionListener instance to be passed to the asynchronous method:

client.deleteByQueryAsync(request, RequestOptions.DEFAULT, listener); 

The DeleteByQueryRequest to execute and the ActionListener to use when the execution completes

The asynchronous method does not block and returns immediately. Once it is completed the ActionListener is called back using the onResponse method if the execution successfully completed or using the onFailure method if it failed.

A typical listener for BulkByScrollResponse looks like:

ActionListener<BulkByScrollResponse> listener = new ActionListener<BulkByScrollResponse>() {
    @Override
    public void onResponse(BulkByScrollResponse bulkResponse) {
        
    }

    @Override
    public void onFailure(Exception e) {
        
    }
};

Called when the execution is successfully completed. The response is provided as an argument and contains a list of individual results for each operation that was executed. Note that one or more operations might have failed while the others have been successfully executed.

Called when the whole DeleteByQueryRequest fails. In this case the raised exception is provided as an argument and no operation has been executed.

Delete By Query Response

edit

The returned BulkByScrollResponse contains information about the executed operations and allows to iterate over each result as follows:

TimeValue timeTaken = bulkResponse.getTook(); 
boolean timedOut = bulkResponse.isTimedOut(); 
long totalDocs = bulkResponse.getTotal(); 
long deletedDocs = bulkResponse.getDeleted(); 
long batches = bulkResponse.getBatches(); 
long noops = bulkResponse.getNoops(); 
long versionConflicts = bulkResponse.getVersionConflicts(); 
long bulkRetries = bulkResponse.getBulkRetries(); 
long searchRetries = bulkResponse.getSearchRetries(); 
TimeValue throttledMillis = bulkResponse.getStatus().getThrottled(); 
TimeValue throttledUntilMillis = bulkResponse.getStatus().getThrottledUntil(); 
List<ScrollableHitSource.SearchFailure> searchFailures = bulkResponse.getSearchFailures(); 
List<BulkItemResponse.Failure> bulkFailures = bulkResponse.getBulkFailures(); 

Get total time taken

Check if the request timed out

Get total number of docs processed

Number of docs that were deleted

Number of batches that were executed

Number of skipped docs

Number of version conflicts

Number of times request had to retry bulk index operations

Number of times request had to retry search operations

The total time this request has throttled itself not including the current throttle time if it is currently sleeping

Remaining delay of any current throttle sleep or 0 if not sleeping

Failures during search phase

Failures during bulk index operation