r - dplyr - filter by group size

Question

Welcome To Ask or Share your Answers For Others

r - dplyr - filter by group size

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - dplyr - filter by group size

What is the best way to filter a data.frame to only get groups of say size 5?

So my data looks as follows:

require(dplyr)
n <- 1e5
x <- rnorm(n)
# Category size ranging each from 1 to 5
cat <- rep(seq_len(n/3), sample(1:5, n/3, replace = TRUE))[1:n]

dat <- data.frame(x = x, cat = cat)

The dplyr way i could come up with was

dat <- group_by(dat, cat)

system.time({
  out1 <- dat %>% filter(n() == 5L)
})
#    user  system elapsed 
#   1.157   0.218   1.497

But this is very slow... Is there a better way in dplyr?

So far my workaround solutions looks as follows:

system.time({
  all_ind <- rep(seq_len(n_groups(dat)), group_size(dat))
  take_only <- which(group_size(dat) == 5L)
  out2 <- dat[all_ind %in% take_only, ]
})
#    user  system elapsed 
#   0.026   0.008   0.036
all.equal(out1, out2) # TRUE

But this doesn't feel very dplyr like...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:08:33+0000

replyed Oct 17, 2021 by 深蓝 (71.8m points)

You can do it more concisely with n():

library(dplyr)
dat %>% group_by(cat) %>% filter(n() == 5)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

r - dplyr - filter by group size

r - dplyr - filter by group size

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags