Task queue backlog
editTask queue backlog
editA backlogged task queue can prevent tasks from completing and put the cluster into an unhealthy state. Resource constraints, a large number of tasks being triggered at once, and long running tasks can all contribute to a backlogged task queue.
Diagnose a task queue backlog
editCheck the thread pool status
A depleted thread pool can result in rejected requests.
You can use the cat thread pool API to see the number of active threads in each thread pool and how many tasks are queued, how many have been rejected, and how many have completed.
response = client.cat.thread_pool( v: true, s: 't,n', h: 'type,name,node_name,active,queue,rejected,completed' ) puts response
GET /_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed
Inspect the hot threads on each node
If a particular thread pool queue is backed up, you can periodically poll the Nodes hot threads API to determine if the thread has sufficient resources to progress and gauge how quickly it is progressing.
response = client.nodes.hot_threads puts response
GET /_nodes/hot_threads
Look for long running tasks
Long-running tasks can also cause a backlog.
You can use the task management API to get information about the tasks that are running.
Check the running_time_in_nanos
to identify tasks that are taking an excessive amount of time to complete.
response = client.tasks.list( filter_path: 'nodes.*.tasks' ) puts response
GET /_tasks?filter_path=nodes.*.tasks
Resolve a task queue backlog
editIncrease available resources
If tasks are progressing slowly and the queue is backing up, you might need to take steps to Reduce CPU usage.
In some cases, increasing the thread pool size might help.
For example, the force_merge
thread pool defaults to a single thread.
Increasing the size to 2 might help reduce a backlog of force merge requests.
Cancel stuck tasks
If you find the active task’s hot thread isn’t progressing and there’s a backlog, consider canceling the task.