Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
820 views
in Technique[技术] by (71.8m points)

time series - Calculate days since last event in R

My question involves how to calculate the number of days since an event last that occurred in R. Below is a minimal example of the data:

df <- data.frame(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","23/05/2001","26/08/2001"), "%d/%m/%Y"), 
event=c(0,0,1,0,1,1,0))
        date event
1 2000-07-06     0
2 2000-09-15     0
3 2000-10-15     1
4 2001-01-03     0
5 2001-03-17     1
6 2001-05-23     1
7 2001-08-26     0

A binary variable(event) has values 1 indicating that the event occurred and 0 otherwise. Repeated observations are done at different times(date) The expected output is as follows with the days since last event(tae):

 date        event       tae
1 2000-07-06     0        NA
2 2000-09-15     0        NA
3 2000-10-15     1         0
4 2001-01-03     0        80
5 2001-03-17     1       153
6 2001-05-23     1        67
7 2001-08-26     0        95

I have looked around for answers to similar problems but they don't address my specific problem. I have tried to implement ideas from from a similar post (Calculate elapsed time since last event) and below is the closest I got to the solution:

library(dplyr)
df %>%
  mutate(tmp_a = c(0, diff(date)) * !event,
         tae = cumsum(tmp_a))

Which yields the output shown below that is not quite the expected:

        date event tmp_a tae
1 2000-07-06     0     0   0
2 2000-09-15     0    71  71
3 2000-10-15     1     0  71
4 2001-01-03     0    80 151
5 2001-03-17     1     0 151
6 2001-05-23     1     0 151
7 2001-08-26     0    95 246

Any assistance on how to fine tune this or a different approach would be greatly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could try something like this:

# make an index of the latest events
last_event_index <- cumsum(df$event) + 1

# shift it by one to the right
last_event_index <- c(1, last_event_index[1:length(last_event_index) - 1])

# get the dates of the events and index the vector with the last_event_index, 
# added an NA as the first date because there was no event
last_event_date <- c(as.Date(NA), df[which(df$event==1), "date"])[last_event_index]

# substract the event's date with the date of the last event
df$tae <- df$date - last_event_date
df

#        date event      tae
#1 2000-07-06     0  NA days
#2 2000-09-15     0  NA days
#3 2000-10-15     1  NA days
#4 2001-01-03     0  80 days
#5 2001-03-17     1 153 days
#6 2001-05-23     1  67 days
#7 2001-08-26     0  95 days

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...