Within each id
, I would like to keep rows that are at least 91 days apart. In my dataframe df
below, id=1
has 5 rows and id=2
has 1 row.
For id=1
, I would like to keep only the 1st, 3rd and 5th rows.
This is because if we compare 1st date and 2nd date, they differ by 32 days. So, remove 2nd date. We proceed to comparing 1st and 3rd date, and they differ by 152 days. So, we keep 3rd date.
Now, instead of using 1st date as reference, we use 3rd date. 3rd date and 4th date differ by 61 days. So, remove 4th date. We proceed to comparing 3rd date and 5th date, and they differ by 121 days. So, we keep 5th date.
In the end, the dates we keep are 1st, 3rd and 5th dates. As for id=2
, there is only one row, so we keep that. The desired result is shown in dfnew
.
df <- read.table(header = TRUE, text = "
id var1 date
1 A 2006-01-01
1 B 2006-02-02
1 C 2006-06-02
1 D 2006-08-02
1 E 2007-12-01
2 F 2007-04-20
",stringsAsFactors=FALSE)
dfnew <- read.table(header = TRUE, text = "
id var1 date
1 A 2006-01-01
1 C 2006-06-02
1 E 2007-12-01
2 F 2007-04-20
",stringsAsFactors=FALSE)
I can only think of starting with grouping the df
by id
as follows:
library(dplyr)
dfnew <- df %>% group_by(id)
However, I am not sure of how to continue from here. Should I proceed with filter
function or slice
? If so, how?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…