Here are a few solutions:
1) zoo Using ave
, for each group create a monthly series, m
, by merging the original series, z
, with a grid, g
. Then calculate the rolling sum and retain only the original time points:
library(zoo)
f <- function(i) {
z <- with(df[i, ], zoo(count, t))
g <- zoo(, seq(start(z), end(z), by = "month"))
m <- merge(z, g)
window(rollapplyr(m, 4, sum, na.rm = TRUE, partial = TRUE), time(z))
}
df$desired <- ave(1:nrow(df), df$id, df$class, FUN = f)
which gives:
> df
id class t count desired
1 1 A 2010-01-15 1 1
2 1 A 2010-02-15 2 3
3 1 B 2010-04-15 3 3
4 1 B 2010-09-15 4 4
5 2 A 2010-01-15 5 5
6 2 B 2010-06-15 6 6
7 2 B 2010-08-15 7 13
8 2 B 2010-09-15 8 21
Note We have assumed the times are ordered within each group (as in the question). If that is not so then sort df
first.
2) sqldf
library(sqldf)
sqldf("select id, class, a.t, a.'count', sum(b.'count') desired
from df a join df b
using(id, class)
where a.t - b.t between 0 and 100
group by id, class, a.t")
which gives:
id class t count desired
1 1 A 2010-01-15 1 1
2 1 A 2010-02-15 2 3
3 1 B 2010-04-15 3 3
4 1 B 2010-09-15 4 4
5 2 A 2010-01-15 5 5
6 2 B 2010-06-15 6 6
7 2 B 2010-08-15 7 13
8 2 B 2010-09-15 8 21
Note: If the merge should be too large to fit into memory then use sqldf("...", dbname = tempfile())
to cause the intermediate results to be stored in a database which it creates on the fly and automatically destroys afterwards.
3) Base R The sqldf solution motivates this base R solution which just translates the SQL into R:
m <- merge(df, df, by = 1:2)
s <- subset(m, t.x - t.y >= 0 & t.x - t.y <= 100)
ag <- aggregate(count.y ~ t.x + class + id, s, sum)
names(ag) <- c("t", "class", "id", "count", "desired")
The result is:
> ag
t class id count desired
1 2010-01-15 A 1 1 1
2 2010-02-15 A 1 2 3
3 2010-04-15 B 1 3 3
4 2010-09-15 B 1 4 4
5 2010-01-15 A 2 5 5
6 2010-06-15 B 2 6 6
7 2010-08-15 B 2 7 13
8 2010-09-15 B 2 8 21
Note: This does do a merge in memory which might be a problem if the data set is very large.
UPDATE: Minor simplifications of first solution and also added second solution.
UPDATE 2: Added third solution.