New

The executive guide to generative AI

Read more
Loading

Tutorial: Analyze eCommerce data with aggregations using Query DSL

Elastic Stack Serverless

This hands-on tutorial shows you how to analyze eCommerce data using Elasticsearch aggregations with the _search API and Query DSL.

You’ll learn how to:

  • Calculate key business metrics such as average order value
  • Analyze sales patterns over time
  • Compare performance across product categories
  • Track moving averages and cumulative totals

You’ll need:

  1. A running instance of Elasticsearch, either on Elastic Cloud Serverless or together with Kibana on Elastic Cloud Hosted/Self Managed deployments.

    • If you don’t have a deployment, you can run the following command in your terminal to set up a local dev environment:

      curl -fsSL https://elastic.co/start-local | sh
      
  2. The sample eCommerce data loaded into Elasticsearch. To load sample data follow these steps in your UI:

    • Open the Integrations pages by searching in the global search field.
    • Search for sample data in the Integrations search field.
    • Open the Sample data page.
    • Select the Other sample data sets collapsible.
    • Add the Sample eCommerce orders data set. This will create and populate an index called kibana_sample_data_ecommerce.

Before we start analyzing the data, let’s examine the structure of the documents in our sample eCommerce index. Run this command to see the field mappings:

 GET kibana_sample_data_ecommerce/_mapping 

The response shows the field mappings for the kibana_sample_data_ecommerce index.

The sample data includes the following field data types:

  • text and keyword for text fields

    • Most text fields have a .keyword subfield for exact matching using multi-fields
  • date for date fields

  • 3 numeric types:

    • integer for whole numbers
    • long for large whole numbers
    • half_float for floating-point numbers
  • geo_point for geographic coordinates

  • object for nested structures such as products, geoip, event

Now that we understand the structure of our sample data, let’s start analyzing it.

Let’s start by calculating important metrics about orders and customers.

Calculate the average order value across all orders in the dataset using the avg aggregation.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "avg_order_value": {
     "avg": {
       "field": "taxful_total_price"
     }
   }
 }
}
  1. Set size to 0 to avoid returning matched documents in the response and return only the aggregation results
  2. A meaningful name that describes what this metric represents
  3. Configures an avg aggregation, which calculates a simple arithmetic mean

Calculate multiple statistics about orders in one request using the stats aggregation.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "order_stats": {
     "stats": {
       "field": "taxful_total_price"
     }
   }
 }
}
  1. A descriptive name for this set of statistics
  2. stats returns count, min, max, avg, and sum at once
Tip

The stats aggregation is more efficient than running individual min, max, avg, and sum aggregations.

Let’s group orders in different ways to understand sales patterns.

Group orders by category to see which product categories are most popular, using the terms aggregation.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "sales_by_category": {
     "terms": {
       "field": "category.keyword",
       "size": 5,
       "order": { "_count": "desc" }
     }
   }
 }
}
  1. Name reflecting the business purpose of this breakdown
  2. terms aggregation groups documents by field values
  3. Use .keyword field for exact matching on text fields
  4. Limit to top 5 categories
  5. Order by number of orders (descending)

Group orders by day to track daily sales patterns using the date_histogram aggregation.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "daily_orders": {
     "date_histogram": {
       "field": "order_date",
       "calendar_interval": "day",
       "format": "yyyy-MM-dd",
       "min_doc_count": 0
     }
   }
 }
}
  1. Descriptive name for the time-series aggregation results.
  2. The date_histogram aggregation groups documents into time-based buckets, similar to terms aggregation but for dates.
  3. Uses calendar and fixed time intervals to handle months with different lengths. "day" ensures consistent daily grouping regardless of timezone.
  4. Formats dates in response using date patterns (e.g. "yyyy-MM-dd"). Refer to date math expressions for additional options.
  5. When min_doc_count is 0, returns buckets for days with no orders, useful for continuous time series visualization.

Now let’s calculate metrics within each group to get deeper insights.

Calculate metrics within each category to compare performance across categories.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "categories": {
     "terms": {
       "field": "category.keyword",
       "size": 5,
       "order": { "total_revenue": "desc" }
     },
     "aggs": {
       "total_revenue": {
         "sum": {
           "field": "taxful_total_price"
         }
       },
       "avg_order_value": {
         "avg": {
           "field": "taxful_total_price"
         }
       },
       "total_items": {
         "sum": {
           "field": "total_quantity"
         }
       }
     }
   }
 }
}
  1. Order categories by their total revenue instead of count
  2. Define metrics to calculate within each category
  3. Total revenue for the category
  4. Average order value in the category
  5. Total number of items sold

Let’s combine metrics to track daily trends: daily revenue, unique customers, and average basket size.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "daily_sales": {
     "date_histogram": {
       "field": "order_date",
       "calendar_interval": "day",
       "format": "yyyy-MM-dd"
     },
     "aggs": {
       "revenue": {
         "sum": {
           "field": "taxful_total_price"
         }
       },
       "unique_customers": {
         "cardinality": {
           "field": "customer_id"
         }
       },
       "avg_basket_size": {
         "avg": {
           "field": "total_quantity"
         }
       }
     }
   }
 }
}
  1. Daily revenue
  2. Uses the cardinality aggregation to count unique customers per day
  3. Average number of items per order

You can use pipeline aggregations on the results of other aggregations. Let’s analyze how metrics change over time.

Moving averages help identify trends by reducing day-to-day noise in the data. Let’s observe sales trends more clearly by smoothing daily revenue variations, using the Moving Function aggregation.

 GET kibana_sample_data_ecommerce/_search {
  "size": 0,
  "aggs": {
    "daily_sales": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_revenue": {
          "sum": {
            "field": "taxful_total_price"
          }
        },
        "smoothed_revenue": {
          "moving_fn": {
            "buckets_path": "daily_revenue",
            "window": 3,
            "script": "MovingFunctions.unweightedAvg(values)"
          }
        }
      }
    }
  }
}
  1. Calculate daily revenue first.
  2. Create a smoothed version of the daily revenue.
  3. Use moving_fn for moving window calculations.
  4. Reference the revenue from our date histogram.
  5. Use a 3-day window — use different window sizes to see trends at different time scales.
  6. Use the built-in unweighted average function in the moving_fn aggregation.
Tip

Notice how the smoothed values lag behind the actual values - this is because they need previous days' data to calculate. The first day will always be null when using moving averages.

Track running totals over time using the cumulative_sum aggregation.

 GET kibana_sample_data_ecommerce/_search {
 "size": 0,
 "aggs": {
   "daily_sales": {
     "date_histogram": {
       "field": "order_date",
       "calendar_interval": "day"
     },
     "aggs": {
       "revenue": {
         "sum": {
           "field": "taxful_total_price"
         }
       },
       "cumulative_revenue": {
         "cumulative_sum": {
           "buckets_path": "revenue"
         }
       }
     }
   }
 }
}
  1. Name for our running total
  2. cumulative_sum adds up values across buckets
  3. Reference the revenue we want to accumulate

Refer to the aggregations reference for more details on all available aggregation types.