Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

r - How do I use tidyr to fill in completed rows within each value of a grouping variable?

Say I have data on people who choose between several options. I have one row per person, and I want to have one row per person and choice option. So, if I have 10 people who have 3 choices, right now I have 10 rows, and I want to have 30.

All of the other variables should be copied to each of the new rows. So, for example, if I have a variable for gender, that should be constant within ID. (I am setting my data up this way to analyze with mnlogit.)

This seems like the situation that two tidyr functions, complete and fill, were designed for. To use a simple example:

library(lubridate)
library(tidyr)
dat <- data.frame(
    id = 1:3,
    choice = 5:7,
    c = c(9, NA, 11),
    d = ymd(NA, "2015-09-30", "2015-09-29")
    )

dat %>% 
  complete(id, choice) %>%
  fill(everything())

# Source: local data frame [9 x 4]
# 
#      id choice     c          d
#   (int)  (int) (dbl)     (time)
# 1     1      5     9       <NA>
# 2     1      6     9       <NA>
# 3     1      7     9       <NA>
# 4     2      5     9       <NA>
# 5     2      6     9 2015-09-30
# 6     2      7     9 2015-09-30
# 7     3      5     9 2015-09-30
# 8     3      6     9 2015-09-30
# 9     3      7    11 2015-09-29

But this has some problems -- the values of d were carried forward correctly, but the values of c from ID 1 replaced the (correct) NA values for ID 2.

I could try a workaround, like replacing all of the missing values with 999, running complete and fill, and then replacing 999 with NA. (I think I would have to convert the date variables to character variables and then convert them back again if I go this route.) But maybe someone on here knows of a tidy way to do this with tidyr?

Edit: the desired output here is:

# Source: local data frame [9 x 4]
# 
#     id     c          d choice
#  (int) (dbl)     (time)  (int)
# 1     1     9       <NA>      5
# 2     1     9       <NA>      6
# 3     1     9       <NA>      7
# 4     2    NA 2015-09-30      5
# 5     2    NA 2015-09-30      6
# 6     2    NA 2015-09-30      7
# 7     3    11 2015-09-29      5
# 8     3    11 2015-09-29      6
# 9     3    11 2015-09-29      7
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As an update to @jeremycg answer. From tidyr 0.5.1 (or maybe even version 0.4.0) onwards c() does not work anymore. Use nesting() instead:

dat %>% 
 complete(nesting(id, c, d), choice) 

Note I was trying to edit @jeremycg answer, since the answer was correct at the time it was written (and hence a new answer is not really necessary) but unfortunately the edit got rejected.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...