Loading

Downsample time series data

Serverless Stack

To downsample a time series data stream (TSDS), you can use index lifecycle management (ILM) or a data stream lifecycle. (You can also use the downsample API with an individual time series index, but most users don't need to use the API.)

Before you begin, review the Downsampling concepts.

Important

Downsampling requires read-only data.

In most cases, you can choose the data stream lifecycle option. If you're using data tiers in Elastic Stack, choose the index lifecycle option.

Serverless Stack

To downsample a time series via a data stream lifecycle, add a downsampling section to the data stream lifecycle (for existing data streams) or the index template (for new data streams).

  • Set fixed_interval to your preferred level of granularity. The original time series data will be aggregated at this interval.
  • Set after to the minimum time to wait after an index rollover, before running downsampling.
				PUT _data_stream/my-data-stream/_lifecycle
					{
  "data_retention": "7d",
  "downsampling": [
     {
       "after": "1m",
       "fixed_interval": "10m"
      },
      {
        "after": "1d",
        "fixed_interval": "1h"
      }
   ]
}
		

The downsampling action runs after the index time series end time has passed.

Serverless Unavailable Stack

To downsample time series data as part of index lifecycle management (ILM), include downsample actions in your ILM policy. You can configure multiple downsampling actions across different phases to progressively reduce data granularity over time.

This example shows a policy with rollover and two downsampling actions: one in the hot phase for initial aggregation at 5-minute intervals, and another in the warm phase for further aggregation at 1-hour intervals:

				PUT _ilm/policy/datastream_policy
					{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover" : {
            "max_age": "5m"
          },
          "downsample": {
  	        "fixed_interval": "5m"
  	      }
        }
      },
      "warm": {
        "actions": {
          "downsample": {
            "fixed_interval": "1h"
          }
        }
      }
    }
  }
}
		

Set fixed_interval to your preferred level of granularity. The original time series data will be aggregated at this interval. The downsample action runs after the index is rolled over and the index time series end time has passed.

This section provides some best practices for downsampling.

When choosing the downsampling interval, make sure to consider the original sampling rate of your measurements. Use an interval that reduces the number of documents by a significant percentage. For example, if a sensor sends data every 10 seconds, downsampling to 1 minute would reduce the number of documents by 83%. Downsampling to 5 minutes instead would reduce the number by 96%.

The same applies when downsampling already downsampled data.

When using index lifecycle management (ILM), you can define at most one downsampling round in each of the following phases:

Phases don't require matching tiers. If a matching tier exists for the phase, ILM automatically migrates the data to the respective tier. To prevent this, add a migrate action and specify enabled: false.

If you leave the default migrate action enabled, downsampling runs on the tier of the source index, which typically has more resources. The smaller, downsampled data is then migrated to the next tier.

Because the downsampling operation processes an entire index at once, it can increase the load on the cluster. Smaller indices improve task distribution which helps to minimize the impact of downsampling on a cluster's performance.

To reduce the index size:

  • limit the number of primary shards, or
  • (ILM only) use max_primary_shard_docs in the rollover action of the hot phase to cap documents per shard. Specify a lower value than the default of 200 million, to help prevent load spikes due to downsampling.