- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment
WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Looking at Time
editLooking at Time
editIf search is the most popular activity in Elasticsearch, building date histograms must be the second most popular. Why would you want to use a date histogram?
Imagine your data has a timestamp. It doesn’t matter what the data is—Apache log events, stock buy/sell transaction dates, baseball game times—anything with a timestamp can benefit from the date histogram. When you have a timestamp, you often want to build metrics that are expressed over time:
- How many cars sold each month this year?
- What was the price of this stock for the last 12 hours?
- What was the average latency of our website every hour in the last week?
While regular histograms are often represented as bar charts, date histograms
tend to be converted into line graphs representing time series. Many
companies use Elasticsearch solely for analytics over time series data. The date_histogram
bucket is their bread and butter.
The date_histogram
bucket works similarly to the regular histogram
. Rather
than building buckets based on a numeric field representing numeric ranges,
it builds buckets based on time ranges. Each bucket is therefore defined as a
certain calendar size (for example, 1 month
or 2.5 days
).
Our first example will build a simple line chart to answer this question: how many cars were sold each month?
GET /cars/transactions/_search { "size" : 0, "aggs": { "sales": { "date_histogram": { "field": "sold", "interval": "month", "format": "yyyy-MM-dd" } } } }
The interval is requested in calendar terminology (for example, one month per bucket). |
|
We provide a date format so that bucket keys are pretty. |
Our query has a single aggregation, which builds a bucket
per month. This will give us the number of cars sold in each month. An additional
format
parameter is provided so the buckets have "pretty" keys. Internally,
dates are simply represented as a numeric value. This tends to make UI designers
grumpy, however, so a prettier format can be specified using common date formatting.
The response is both expected and a little surprising (see if you can spot the surprise):
{ ... "aggregations": { "sales": { "buckets": [ { "key_as_string": "2014-01-01", "key": 1388534400000, "doc_count": 1 }, { "key_as_string": "2014-02-01", "key": 1391212800000, "doc_count": 1 }, { "key_as_string": "2014-05-01", "key": 1398902400000, "doc_count": 1 }, { "key_as_string": "2014-07-01", "key": 1404172800000, "doc_count": 1 }, { "key_as_string": "2014-08-01", "key": 1406851200000, "doc_count": 1 }, { "key_as_string": "2014-10-01", "key": 1412121600000, "doc_count": 1 }, { "key_as_string": "2014-11-01", "key": 1414800000000, "doc_count": 2 } ] ... }
The aggregation is represented in full. As you can see, we have buckets
that represent months, a count of docs in each month, and our pretty key_as_string
.