IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Downsampling a time series data stream Run downsampling with ILM »

› › ›

Run downsampling manually

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Run downsampling manually

edit

This is a simplified example that allows you to see quickly how downsampling works to reduce the storage size of a time series index. The example uses typical Kubernetes cluster monitoring data. To test out downsampling, follow these steps:

Prerequisites

edit

Refer to time series data stream prerequisites.

For the example you need a sample data file. Download the file from link: here and save it in the local directory where you’re running Elasticsearch.

Create a time series index

edit

This creates an index for a basic data stream. The available parameters for an index are described in detail in Set up a time series data stream.

The time series boundaries are set so that sampling data for the index begins at 2022-06-10T00:00:00Z and ends at 2022-06-30T23:59:59Z.

For simplicity, in the time series mapping all time_series_metric parameters are set to type gauge, but other values such as counter and histogram may also be used. The time_series_metric values determine the kind of statistical representations that are used during downsampling.

The index template includes a set of static time series dimensions: host, namespace, node, and pod. The time series dimensions are not changed by the downsampling process.

response = client.indices.create(
  index: 'sample-01',
  body: {
    settings: {
      index: {
        mode: 'time_series',
        time_series: {
          start_time: '2022-06-10T00:00:00Z',
          end_time: '2022-06-30T23:59:59Z'
        },
        routing_path: [
          'kubernetes.namespace',
          'kubernetes.host',
          'kubernetes.node',
          'kubernetes.pod'
        ],
        number_of_replicas: 0,
        number_of_shards: 2
      }
    },
    mappings: {
      properties: {
        "@timestamp": {
          type: 'date'
        },
        kubernetes: {
          properties: {
            container: {
              properties: {
                cpu: {
                  properties: {
                    usage: {
                      properties: {
                        core: {
                          properties: {
                            ns: {
                              type: 'long'
                            }
                          }
                        },
                        limit: {
                          properties: {
                            pct: {
                              type: 'float'
                            }
                          }
                        },
                        nanocores: {
                          type: 'long',
                          time_series_metric: 'gauge'
                        },
                        node: {
                          properties: {
                            pct: {
                              type: 'float'
                            }
                          }
                        }
                      }
                    }
                  }
                },
                memory: {
                  properties: {
                    available: {
                      properties: {
                        bytes: {
                          type: 'long',
                          time_series_metric: 'gauge'
                        }
                      }
                    },
                    majorpagefaults: {
                      type: 'long'
                    },
                    pagefaults: {
                      type: 'long',
                      time_series_metric: 'gauge'
                    },
                    rss: {
                      properties: {
                        bytes: {
                          type: 'long',
                          time_series_metric: 'gauge'
                        }
                      }
                    },
                    usage: {
                      properties: {
                        bytes: {
                          type: 'long',
                          time_series_metric: 'gauge'
                        },
                        limit: {
                          properties: {
                            pct: {
                              type: 'float'
                            }
                          }
                        },
                        node: {
                          properties: {
                            pct: {
                              type: 'float'
                            }
                          }
                        }
                      }
                    },
                    workingset: {
                      properties: {
                        bytes: {
                          type: 'long',
                          time_series_metric: 'gauge'
                        }
                      }
                    }
                  }
                },
                name: {
                  type: 'keyword'
                },
                start_time: {
                  type: 'date'
                }
              }
            },
            host: {
              type: 'keyword',
              time_series_dimension: true
            },
            namespace: {
              type: 'keyword',
              time_series_dimension: true
            },
            node: {
              type: 'keyword',
              time_series_dimension: true
            },
            pod: {
              type: 'keyword',
              time_series_dimension: true
            }
          }
        }
      }
    }
  }
)
puts response

PUT /sample-01
{
    "settings": {
        "index": {
            "mode": "time_series",
            "time_series": {
                "start_time": "2022-06-10T00:00:00Z",
                "end_time": "2022-06-30T23:59:59Z"
            },
            "routing_path": [
                "kubernetes.namespace",
                "kubernetes.host",
                "kubernetes.node",
                "kubernetes.pod"
            ],
            "number_of_replicas": 0,
            "number_of_shards": 2
        }
    },
    "mappings": {
        "properties": {
            "@timestamp": {
                "type": "date"
            },
            "kubernetes": {
                "properties": {
                    "container": {
                        "properties": {
                            "cpu": {
                                "properties": {
                                    "usage": {
                                        "properties": {
                                            "core": {
                                                "properties": {
                                                    "ns": {
                                                        "type": "long"
                                                    }
                                                }
                                            },
                                            "limit": {
                                                "properties": {
                                                    "pct": {
                                                        "type": "float"
                                                    }
                                                }
                                            },
                                            "nanocores": {
                                                "type": "long",
                                                "time_series_metric": "gauge"
                                            },
                                            "node": {
                                                "properties": {
                                                    "pct": {
                                                        "type": "float"
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            },
                            "memory": {
                                "properties": {
                                    "available": {
                                        "properties": {
                                            "bytes": {
                                                "type": "long",
                                                "time_series_metric": "gauge"
                                            }
                                        }
                                    },
                                    "majorpagefaults": {
                                        "type": "long"
                                    },
                                    "pagefaults": {
                                        "type": "long",
                                        "time_series_metric": "gauge"
                                    },
                                    "rss": {
                                        "properties": {
                                            "bytes": {
                                                "type": "long",
                                                "time_series_metric": "gauge"
                                            }
                                        }
                                    },
                                    "usage": {
                                        "properties": {
                                            "bytes": {
                                                "type": "long",
                                                "time_series_metric": "gauge"
                                            },
                                            "limit": {
                                                "properties": {
                                                    "pct": {
                                                        "type": "float"
                                                    }
                                                }
                                            },
                                            "node": {
                                                "properties": {
                                                    "pct": {
                                                        "type": "float"
                                                    }
                                                }
                                            }
                                        }
                                    },
                                    "workingset": {
                                        "properties": {
                                            "bytes": {
                                                "type": "long",
                                                "time_series_metric": "gauge"
                                            }
                                        }
                                    }
                                }
                            },
                            "name": {
                                "type": "keyword"
                            },
                            "start_time": {
                                "type": "date"
                            }
                        }
                    },
                    "host": {
                        "type": "keyword",
                        "time_series_dimension": true
                    },
                    "namespace": {
                        "type": "keyword",
                        "time_series_dimension": true
                    },
                    "node": {
                        "type": "keyword",
                        "time_series_dimension": true
                    },
                    "pod": {
                        "type": "keyword",
                        "time_series_dimension": true
                    }
                }
            }
        }
    }
}

Copy as curl Try in Elastic

Ingest time series data

edit

In a terminal window with Elasticsearch running, run the following curl command to load the documents from the downloaded sample data file:

curl -s -H "Content-Type: application/json" \
   -XPOST http://<elasticsearch-node>/sample-01/_bulk?pretty \
   --data-binary @sample-k8s-metrics.json

Approximately 18,000 documents are added. Check the search results for the newly ingested data:

response = client.search(
  index: 'sample-01*'
)
puts response

GET /sample-01*/_search

Copy as curl Try in Elastic

The query has at least 10,000 hits and returns the first 10. In each document you can see the time series dimensions (host, node, pod and container) as well as the various CPU and memory time series metrics.

  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "sample-01",
        "_id": "WyHN6N6AwdaJByQWAAABgYOOweA",
        "_score": 1,
        "_source": {
          "@timestamp": "2022-06-20T23:59:40Z",
          "kubernetes": {
            "host": "gke-apps-0",
            "node": "gke-apps-0-1",
            "pod": "gke-apps-0-1-0",
            "container": {
              "cpu": {
                "usage": {
                  "nanocores": 80037,
                  "core": {
                    "ns": 12828317850
                  },
                  "node": {
                    "pct": 0.0000277905
                  },
                  "limit": {
                    "pct": 0.0000277905
                  }
                }
              },
              "memory": {
                "available": {
                  "bytes": 790830121
                },
                "usage": {
                  "bytes": 139548672,
                  "node": {
                    "pct": 0.01770037710617187
                  },
                  "limit": {
                    "pct": 0.00009923134671484496
                  }
                },
                "workingset": {
                  "bytes": 2248540
                },
                "rss": {
                  "bytes": 289260
                },
                "pagefaults": 74843,
                "majorpagefaults": 0
              },
              "start_time": "2021-03-30T07:59:06Z",
              "name": "container-name-44"
            },
            "namespace": "namespace26"
          }
        }
      }
...

Next, you can run a terms aggregation on the set of time series dimensions (_tsid) to view a date histogram on a fixed interval of one day.

response = client.search(
  index: 'sample-01*',
  body: {
    size: 0,
    aggregations: {
      tsid: {
        terms: {
          field: '_tsid'
        },
        aggregations: {
          over_time: {
            date_histogram: {
              field: '@timestamp',
              fixed_interval: '1d'
            },
            aggregations: {
              min: {
                min: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              },
              max: {
                max: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              },
              avg: {
                avg: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              }
            }
          }
        }
      }
    }
  }
)
puts response

GET /sample-01*/_search
{
    "size": 0,
    "aggs": {
        "tsid": {
            "terms": {
                "field": "_tsid"
            },
            "aggs": {
                "over_time": {
                    "date_histogram": {
                        "field": "@timestamp",
                        "fixed_interval": "1d"
                    },
                    "aggs": {
                        "min": {
                            "min": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        },
                        "max": {
                            "max": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        },
                        "avg": {
                            "avg": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        }
                    }
                }
            }
        }
    }
}

Copy as curl Try in Elastic

Run downsampling for the index

edit

Before running downsampling, the index needs to be set to read only mode:

response = client.indices.add_block(
  index: 'sample-01',
  block: 'write'
)
puts response

PUT /sample-01/_block/write

Copy as curl Try in Elastic

And now, you can use the downsample API to downsample the index, setting the time series interval to one hour:

response = client.indices.downsample(
  index: 'sample-01',
  target_index: 'sample-01-downsample',
  body: {
    fixed_interval: '1h'
  }
)
puts response

POST /sample-01/_downsample/sample-01-downsample
{
  "fixed_interval": "1h"
}

Copy as curl Try in Elastic

Finally, delete the original index:

response = client.indices.delete(
  index: 'sample-01'
)
puts response

DELETE /sample-01

Copy as curl Try in Elastic

View the results

edit

Re-run your search query (note that when querying downsampled indices there are a few nuances to be aware of):

response = client.search(
  index: 'sample-01*'
)
puts response

GET /sample-01*/_search

Copy as curl Try in Elastic

In the query results, notice that the number of hits has been reduced to only 288 documents. As well, for each time series metric statistical representations have been calculated: min, max, sum, and value_count.

  "hits": {
    "total": {
      "value": 288,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "sample-01-downsample",
        "_id": "WyHN6N6AwdaJByQWAAABgYNYIYA",
        "_score": 1,
        "_source": {
          "@timestamp": "2022-06-20T23:00:00.000Z",
          "_doc_count": 81,
          "kubernetes.host": "gke-apps-0",
          "kubernetes.namespace": "namespace26",
          "kubernetes.node": "gke-apps-0-1",
          "kubernetes.pod": "gke-apps-0-1-0",
          "kubernetes.container.cpu.usage.nanocores": {
            "min": 23344,
            "max": 163408,
            "sum": 7488985,
            "value_count": 81
          },
          "kubernetes.container.memory.available.bytes": {
            "min": 167751844,
            "max": 1182251090,
            "sum": 58169948901,
            "value_count": 81
          },
          "kubernetes.container.memory.rss.bytes": {
            "min": 54067,
            "max": 391987,
            "sum": 17550215,
            "value_count": 81
          },
          "kubernetes.container.memory.pagefaults": {
            "min": 69086,
            "max": 428910,
            "sum": 20239365,
            "value_count": 81
          },
          "kubernetes.container.memory.workingset.bytes": {
            "min": 323420,
            "max": 2279342,
            "sum": 104233700,
            "value_count": 81
          },
          "kubernetes.container.memory.usage.bytes": {
            "min": 61401416,
            "max": 413064069,
            "sum": 18557182404,
            "value_count": 81
          }
        }
      },
...

You can now re-run the earlier aggregation. Even though the aggregation runs on the downsampled data stream that only contains 288 documents, it returns the same results as the earlier aggregation on the original data stream.

response = client.search(
  index: 'sample-01*',
  body: {
    size: 0,
    aggregations: {
      tsid: {
        terms: {
          field: '_tsid'
        },
        aggregations: {
          over_time: {
            date_histogram: {
              field: '@timestamp',
              fixed_interval: '1d'
            },
            aggregations: {
              min: {
                min: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              },
              max: {
                max: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              },
              avg: {
                avg: {
                  field: 'kubernetes.container.memory.usage.bytes'
                }
              }
            }
          }
        }
      }
    }
  }
)
puts response

GET /sample-01*/_search
{
    "size": 0,
    "aggs": {
        "tsid": {
            "terms": {
                "field": "_tsid"
            },
            "aggs": {
                "over_time": {
                    "date_histogram": {
                        "field": "@timestamp",
                        "fixed_interval": "1d"
                    },
                    "aggs": {
                        "min": {
                            "min": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        },
                        "max": {
                            "max": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        },
                        "avg": {
                            "avg": {
                                "field": "kubernetes.container.memory.usage.bytes"
                            }
                        }
                    }
                }
            }
        }
    }
}

Copy as curl Try in Elastic

This example demonstrates how downsampling can dramatically reduce the number of records stored for time series data, within whatever time boundaries you choose. It’s also possible to perform downsampling on already downsampled data, to further reduce storage and associated costs, as the time series data ages and the data resolution becomes less critical.

Downsampling is very easily integrated within an ILM policy. To learn more, try the Run downsampling with ILM example.

« Downsampling a time series data stream Run downsampling with ILM »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Run downsampling manually

Run downsampling manually

Prerequisites

Create a time series index

Ingest time series data

Run downsampling for the index

View the results

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards