Index Sorting
editIndex Sorting
editWhen creating a new index in Elasticsearch it is possible to configure how the Segments
inside each Shard will be sorted. By default Lucene does not apply any sort.
The index.sort.*
settings define which fields should be used to sort the documents inside each Segment.
nested fields are not compatible with index sorting because they rely on the assumption that nested documents are stored in contiguous doc ids, which can be broken by index sorting. An error will be thrown if index sorting is activated on an index that contains nested fields.
For instance the following example shows how to define a sort on a single field:
resp = client.indices.create( index="my-index-000001", body={ "settings": { "index": {"sort.field": "date", "sort.order": "desc"} }, "mappings": {"properties": {"date": {"type": "date"}}}, }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { index: { 'sort.field' => 'date', 'sort.order' => 'desc' } }, mappings: { properties: { date: { type: 'date' } } } } ) puts response
PUT my-index-000001 { "settings": { "index": { "sort.field": "date", "sort.order": "desc" } }, "mappings": { "properties": { "date": { "type": "date" } } } }
It is also possible to sort the index by more than one field:
resp = client.indices.create( index="my-index-000001", body={ "settings": { "index": { "sort.field": ["username", "date"], "sort.order": ["asc", "desc"], } }, "mappings": { "properties": { "username": {"type": "keyword", "doc_values": True}, "date": {"type": "date"}, } }, }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { index: { 'sort.field' => [ 'username', 'date' ], 'sort.order' => [ 'asc', 'desc' ] } }, mappings: { properties: { username: { type: 'keyword', doc_values: true }, date: { type: 'date' } } } } ) puts response
PUT my-index-000001 { "settings": { "index": { "sort.field": [ "username", "date" ], "sort.order": [ "asc", "desc" ] } }, "mappings": { "properties": { "username": { "type": "keyword", "doc_values": true }, "date": { "type": "date" } } } }
This index is sorted by |
|
… in ascending order for the |
Index sorting supports the following settings:
-
index.sort.field
-
The list of fields used to sort the index.
Only
boolean
,numeric
,date
andkeyword
fields withdoc_values
are allowed here. -
index.sort.order
-
The sort order to use for each field. The order option can have the following values:
-
asc
: For ascending order -
desc
: For descending order.
-
-
index.sort.mode
-
Elasticsearch supports sorting by multi-valued fields. The mode option controls what value is picked to sort the document. The mode option can have the following values:
-
min
: Pick the lowest value. -
max
: Pick the highest value.
-
-
index.sort.missing
-
The missing parameter specifies how docs which are missing the field should be treated. The missing value can have the following values:
-
_last
: Documents without value for the field are sorted last. -
_first
: Documents without value for the field are sorted first.
-
Index sorting can be defined only once at index creation. It is not allowed to add or update a sort on an existing index. Index sorting also has a cost in terms of indexing throughput since documents must be sorted at flush and merge time. You should test the impact on your application before activating this feature.
Early termination of search request
editBy default in Elasticsearch a search request must visit every document that matches a query to retrieve the top documents sorted by a specified sort. Though when the index sort and the search sort are the same it is possible to limit the number of documents that should be visited per segment to retrieve the N top ranked documents globally. For example, let’s say we have an index that contains events sorted by a timestamp field:
resp = client.indices.create( index="events", body={ "settings": { "index": {"sort.field": "timestamp", "sort.order": "desc"} }, "mappings": {"properties": {"timestamp": {"type": "date"}}}, }, ) print(resp)
response = client.indices.create( index: 'events', body: { settings: { index: { 'sort.field' => 'timestamp', 'sort.order' => 'desc' } }, mappings: { properties: { timestamp: { type: 'date' } } } } ) puts response
PUT events { "settings": { "index": { "sort.field": "timestamp", "sort.order": "desc" } }, "mappings": { "properties": { "timestamp": { "type": "date" } } } }
You can search for the last 10 events with:
resp = client.search( index="events", body={"size": 10, "sort": [{"timestamp": "desc"}]}, ) print(resp)
response = client.search( index: 'events', body: { size: 10, sort: [ { timestamp: 'desc' } ] } ) puts response
GET /events/_search { "size": 10, "sort": [ { "timestamp": "desc" } ] }
Elasticsearch will detect that the top docs of each segment are already sorted in the index and will only compare the first N documents per segment. The rest of the documents matching the query are collected to count the total number of results and to build aggregations.
If you’re only looking for the last 10 events and have no interest in
the total number of documents that match the query you can set track_total_hits
to false:
resp = client.search( index="events", body={ "size": 10, "sort": [{"timestamp": "desc"}], "track_total_hits": False, }, ) print(resp)
response = client.search( index: 'events', body: { size: 10, sort: [ { timestamp: 'desc' } ], track_total_hits: false } ) puts response
The index sort will be used to rank the top documents and each segment will early terminate the collection after the first 10 matches. |
This time, Elasticsearch will not try to count the number of documents and will be able to terminate the query as soon as N documents have been collected per segment.
Aggregations will collect all documents that match the query regardless
of the value of track_total_hits