Time Series Aggregation - Custom Three Months Aggregates from Monthly tsibble
Introduction
When working with time series data, it’s not uncommon to need to aggregate the data into custom intervals. In this post, we’ll explore how to achieve custom three-month aggregates from a monthly tsibble. We’ll delve into the world of time series aggregation and discuss the necessary steps to create these aggregates.
Background
A tsibble is a type of time series data structure in R that combines the benefits of data frames and time series objects. It provides a convenient way to work with time series data, including indexing and aggregating the data.
In this example, we have a monthly tsibble named data with two variables: geo (geographic location) and value (a measure of something). We want to aggregate the data into custom three-month intervals, which are not typical quarters but rather specific periods made up of three consecutive months.
Solution
To achieve this, we’ll use a combination of mutate, index_by, group_by_key, and summarise functions in the dplyr package. Here’s how you can do it:
data.q <- data |>
mutate(date = date + month(1)) |> # Add a quarter to your date before indexing
index_by(quarter = ~yearquarter(.)) |> # Index by year and quarter
group_by_key() |> # Group by key (in this case, the index)
summarise(geo = first(geo), # Use the first geographic location
value = sum(value)) # Sum up the values for each group
However, the provided solution does not follow the exact structure required in the prompt. The code block will be reformatted to conform to those requirements.
data.q <- data |>
mutate(date = date + month(1))
# Add a quarter to your date before indexing
data.q <- data.q |>
index_by(quarter = ~yearquarter(.))
# Group by key (in this case, the index)
data.q <- data.q |>
group_by_key()
# Summarise using summarise
data.q <- data.q |>
summarise(
geo = first(geo),
# Use the first geographic location
value = sum(value) # Sum up the values for each group
)
Explanation
Let’s break down what each line does:
mutate(date = date + month(1))adds a quarter to the original date. This is done by incrementing the month by one, which effectively moves us forward three months.index_by(quarter = ~yearquarter(.))indexes the data by quarter. The~yearquarter(.)function extracts the year and quarter from each date.group_by_key()groups the data by its index. Since we’ve indexed the data by quarter, this will group the data into custom three-month intervals.summarise(geo = first(geo), value = sum(value))summarizes the data for each group. Thefirst(geo)function uses the first geographic location in each group (since we’re only grouping by one variable), and thesum(value)function sums up the values for each group.
Example Use Case
Suppose you have a tsibble with daily temperature data for various locations around the world. You want to calculate the average temperature for each three-month period. Using this approach, you can easily create custom aggregates from your time series data.
# Create a sample tsibble with temperature data
data_temp <- tsibble(
date = seq.Date(from = "2020-01-01", to = "2022-12-31", by = "day"),
location = rep(c("New York", "Los Angeles", "Chicago"), each = 365),
temp = rnorm(1095, mean = 20, sd = 10)
)
# Apply the custom aggregation
data_temp.q <- data_temp |>
mutate(date = date + month(1)) |>
index_by(quarter = ~yearquarter(.)) |>
group_by_key() |>
summarise(
location = first(location), # Use the first location
temp_avg = mean(temp) # Calculate the average temperature for each group
)
# Print the aggregated data
data_temp.q
This code will create a new tsibble with the average temperature for each three-month period.
Last modified on 2023-07-12