Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
624 views
in Technique[技术] by (71.8m points)

r - sum over past window-size dates per group

The problem is similar to How do I do a conditional sum which only looks between certain date criteria but slightly different and the answer from that does not fit into current problem. The main difference is that the date column based on each group may not necessarily be complete (i.e., certain date may be missing)

Input:

input <- read.table(text="
2017-04-01     A     1
2017-04-02     B     2
2017-04-02     B     2
2017-04-02     C     2
2017-04-02     A     2
2017-04-03     C     3
2017-04-04     A     4
2017-04-05     B     5
2017-04-06     C     6
2017-04-07     A     7
2017-04-08     B     8
2017-04-09     C     9")
colnames(input) <- c("Date","Group","Score")

Rule: for each group at each date, looking back 3 calendar dates (include current date). calculate the sum.

Expected output:

    Date Group 3DaysSumPerGroup
    2017-04-01     A                1 #1  previous two dates are not available. partial is allowed
    2017-04-02     A                3 #2+1 both 4-01 and 4-02 are in the range
    2017-04-04     A                6 #4+2
    2017-04-07     A                7 #7
    2017-04-02     B                4 # 2+2 at the same day
    2017-04-05     B                5
    2017-04-08     B                8
    2017-04-02     C                2
    2017-04-03     C                5
    2017-04-06     C                6
    2017-04-09     C                9

I tried to use rollapply with partial=T, but result doesn't seem correct.

 input %>% 
     group_by(Group) %>% 
     arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a (supposedly efficient) solution using the new non-equi joins and the by = .EACHI features in data.table (v1.9.8+)

library(data.table) #v1.10.4

## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
  Date = as.IDate(Date)
  Date2 = Date - 2L
  .(Date, Date2)
}]

## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column 

input[unique(input, by = c("Date", "Group")), ## This removes the dupes
      on = .(Group, Date <= Date, Date >= Date2), ## The join condition
      .(Score = sum(Score)), ## sum the scores
      keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))

#     Group       Date       Date Score
#  1:     A 2017-04-01 2017-03-30     1
#  2:     A 2017-04-02 2017-03-31     3
#  3:     A 2017-04-04 2017-04-02     6
#  4:     A 2017-04-07 2017-04-05     7
#  5:     B 2017-04-02 2017-03-31     4
#  6:     B 2017-04-05 2017-04-03     5
#  7:     B 2017-04-08 2017-04-06     8
#  8:     C 2017-04-02 2017-03-31     2
#  9:     C 2017-04-03 2017-04-01     5
# 10:     C 2017-04-06 2017-04-04     6
# 11:     C 2017-04-09 2017-04-07     9

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...