Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
161 views
in Technique[技术] by (71.8m points)

r - Finding running maximum by group

I need to find a running maximum of a variable by group using R. The variable is sorted by time within group using df[order(df$group, df$time),].

My variable has some NA's but I can deal with it by replacing them with zeros for this computation.

this is how the data frame df looks like:

(df <- structure(list(var = c(5L, 2L, 3L, 4L, 0L, 3L, 6L, 4L, 8L, 4L),
               group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
                                 .Label = c("a", "b"), class = "factor"),
               time = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)),
          .Names = c("var", "group","time"),
          class = "data.frame", row.names = c(NA, -10L)))

#    var group time
# 1    5     a    1
# 2    2     a    2
# 3    3     a    3
# 4    4     a    4
# 5    0     a    5
# 6    3     b    1
# 7    6     b    2
# 8    4     b    3
# 9    8     b    4
# 10   4     b    5

And I want a variable curMax as:

var  |  group  |  time  |  curMax
5       a         1         5
2       a         2         5
3       a         3         5
4       a         4         5
0       a         5         5
3       b         1         3
6       b         2         6
4       b         3         6
8       b         4         8
4       b         5         8

Please let me know if you have any idea how to implement it in R.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can try data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'group' , we get the cummax of 'var' and assign (:=) it to a new variable ('curMax')

library(data.table)
setDT(df1)[, curMax := cummax(var), by = group]

As commented by @Michael Chirico, if the data is not ordered by 'time', we can do that in the 'i'

setDT(df1)[order(time), curMax:=cummax(var), by = group]

Or with dplyr

library(dplyr)
df1 %>% 
    group_by(group) %>%
    mutate(curMax = cummax(var)) 

If df1 is tbl_sql explicit ordering might be required, using arrange

df1 %>% 
    group_by(group) %>%
    arrange(time, .by_group=TRUE) %>%
    mutate(curMax = cummax(var)) 

or dbplyr::window_order

library(dbplyr)

df1 %>% 
    group_by(group) %>%
    window_order(time) %>%
    mutate(curMax = cummax(var)) 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...