I have a vector of numbers in a data.frame such as below.
df <- data.frame(a = c(1,2,3,4,2,3,4,5,8,9,10,1,2,1))
I need to create a new column which gives a running count of entries that are greater than their predecessor. The resulting column vector should be this:
0,1,2,3,0,1,2,3,4,5,6,0,1,0
My attempt is to create a "flag" column of diffs to mark when the values are greater.
df$flag <- c(0,diff(df$a)>0)
> df$flag
[1] 0 1 1 1 0 1 1 1 1 1 1 0 1 0
Then I can apply some dplyr group/sum magic to almost get the right answer, except that the sum doesn't reset when flag == 0:
df %>% group_by(flag) %>% mutate(run=cumsum(flag))
a flag run
1 1 0 0
2 2 1 1
3 3 1 2
4 4 1 3
5 2 0 0
6 3 1 4
7 4 1 5
8 5 1 6
9 8 1 7
10 9 1 8
11 10 1 9
12 1 0 0
13 2 1 10
14 1 0 0
I don't want to have to resort to a for() loop because I have several of these running sums to compute with several hundred thousand rows in a data.frame.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…