Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

r - Subset data frame to include only levels of one factor that have values in both levels of another factor

I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

My goal is to subset that data frame by selecting the IDs that appear at least once as a juvenile and at least once as an adult. Not sure how to do that..?

The resulting dataframe would contain all measurements for individuals a1, a2 and a3, but would exclude a4, a5 and a6, as they were not measured at both stages.

A similar question was asked 7 months ago but never had an answer (Subset data frame to include only levels one factor that have values in both levels of another factor)

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With dplyr, you can use group_by %>% filter:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...