Q: Can I delete only certain data from within indices?
editQ: Can I delete only certain data from within indices?
editA: It’s complicated
editTL;DR: No. Curator can only delete entire indices.
editFull answer:
editAs a thought exercise, think of Elasticsearch indices as being like databases,
or tablespaces within a database. If you had hundreds of millions of rows to
delete from your database, would you run a separate
DELETE from TABLE where date<YYYY.MM.dd
to assemble hundreds of millions of
individual delete operations every day, or would you partition your tables in a
way that you could simply run DROP table TABLENAME.YYYY.MM.dd
? The strain on
your database would be astronomical on the former and next to nothing on the
latter. Elasticsearch works much the same way. While Elasticsearch can
technically do both methods, for use-cases with time-series data (like logging),
we recommend dropping entire indices vs. the extremely I/O expensive search and
delete method. Curator was created to help fill that need.
While you can store different types within different indices (e.g. syslog-2014.05.05, apache-2015.05.06), this gets very expensive, very quickly in a totally different way. Each shard in Elasticsearch is a Lucene index. Each index requires a portion of the heap to exist and be kept current. If you have 3 daily indices with 5 primary shards each, you suddenly have reduced the available heap space for shard management by a factor of 3, having gone from 5 shards to 15, per index, not counting multiple indexes per day. The ways to mitigate this (if you pursue this route) include massive daily indexing boxes and using shard allocation/routing to move indices to specific members of the cluster where they can have less effect; keeping fewer days of information; having more nodes in your cluster, and so forth.
Conclusion:
editWhile it may be desirable to have different life-cycles for your data, sometimes it’s just easier and cheaper to store everything as long as the longest life-cycle you wish to maintain.
Post-script:
editEven though it is neither recommended [4], nor best practices, it is still possible to perform these search & delete operations yourself, using the Delete-by-Query API. Curator will not be modified to perform operations such as these, however. Curator is meant to manage at the index level, rather than the data level.