Downsample time series data
Serverless Stack
To downsample a time series data stream (TSDS), you can use index lifecycle management (ILM) or a data stream lifecycle. (You can also use the downsample API with an individual time series index, but most users don't need to use the API.)
Before you begin, review the Downsampling concepts.
Downsampling requires read-only data.
In most cases, you can choose the data stream lifecycle option. If you're using data tiers in Elastic Stack, choose the index lifecycle option.
Serverless Stack
To downsample a time series via a data stream lifecycle, add a downsampling section to the data stream lifecycle (for existing data streams) or the index template (for new data streams).
- Set
fixed_interval
to your preferred level of granularity. The original time series data will be aggregated at this interval. - Set
after
to the minimum time to wait after an index rollover, before running downsampling.
PUT _data_stream/my-data-stream/_lifecycle
{
"data_retention": "7d",
"downsampling": [
{
"after": "1m",
"fixed_interval": "10m"
},
{
"after": "1d",
"fixed_interval": "1h"
}
]
}
The downsampling action runs after the index time series end time has passed.
Serverless Stack
To downsample time series data as part of index lifecycle management (ILM), include downsample actions in your ILM policy. You can configure multiple downsampling actions across different phases to progressively reduce data granularity over time.
This example shows a policy with rollover and two downsampling actions: one in the hot phase for initial aggregation at 5-minute intervals, and another in the warm phase for further aggregation at 1-hour intervals:
PUT _ilm/policy/datastream_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover" : {
"max_age": "5m"
},
"downsample": {
"fixed_interval": "5m"
}
}
},
"warm": {
"actions": {
"downsample": {
"fixed_interval": "1h"
}
}
}
}
}
}
Set fixed_interval
to your preferred level of granularity. The original time series data will be aggregated at this interval. The downsample action runs after the index is rolled over and the index time series end time has passed.
This section provides some best practices for downsampling.
When choosing the downsampling interval, make sure to consider the original sampling rate of your measurements. Use an interval that reduces the number of documents by a significant percentage. For example, if a sensor sends data every 10 seconds, downsampling to 1 minute would reduce the number of documents by 83%. Downsampling to 5 minutes instead would reduce the number by 96%.
The same applies when downsampling already downsampled data.
When using index lifecycle management (ILM), you can define at most one downsampling round in each of the following phases:
hot
phase: Runs after the index time series end time passeswarm
phase: Runs after themin_age
time (starting the count after the rollover and respecting the index time series end time)cold
phase: Runs after themin_age
time (starting the count after the rollover and respecting the index time series end time
Phases don't require matching tiers. If a matching tier exists for the phase, ILM automatically migrates the data to the respective tier. To prevent this, add a migrate action and specify enabled: false
.
If you leave the default migrate action enabled, downsampling runs on the tier of the source index, which typically has more resources. The smaller, downsampled data is then migrated to the next tier.
Because the downsampling operation processes an entire index at once, it can increase the load on the cluster. Smaller indices improve task distribution which helps to minimize the impact of downsampling on a cluster's performance.
To reduce the index size:
- limit the number of primary shards, or
- (ILM only) use
max_primary_shard_docs
in the rollover action of thehot
phase to cap documents per shard. Specify a lower value than the default of 200 million, to help prevent load spikes due to downsampling.