r - Add missing rows within combinations of factors

Question

Welcome To Ask or Share your Answers For Others

r - Add missing rows within combinations of factors

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Add missing rows within combinations of factors

I have a data frame that's maybe best approximated as:

library(data.table)
z <- rep("z",5)
y <- c(rep("st",2),rep("co",2),"fu")
var1 <- c(rep("a",2),rep("b",2),"c")
var2 <- c("y","y","y","z","x")
transp <- c("bus","plane","train","bus","bus")
sample1 <- sample(1:10, 5)
sample2 <- sample(1:10, 5)
df <- cbind(z,y,var1,var2,transp,sample1,sample2)
df<-as.data.table(df)
> df
   z  y var1 var2 transp sample1 sample2
1: z st    a    y    bus       4       3
2: z st    a    y  plane      10       7
3: z co    b    y  train       8       9
4: z co    b    z    bus       1       5
5: z fu    c    x    bus       6       4

All unique combinations of var1 and var2 already exist in the table. I want to expand the table so that all combinations of var1/var2 include all transp options found in a list:

transtype <- c("bus","train")

Notice "plane" is an option in df but not in transtype. I would like to keep the row that includes transp="plane" but not expand by adding rows with "plane". The columns z and y need to be filled in with the appropriate value and sample1 and sample2 should be NA. Result should be:

    > result
   z  y var1 var2 transp sample1 sample2
1: z st    a    y    bus       4       3
2: z st    a    y  plane      10       7
3: z st    a    y  train      NA      NA
4: z co    b    y  train       8       9
5: z co    b    y    bus      NA      NA
6: z co    b    z    bus       1       5
7: z co    b    z  train      NA      NA
8: z fu    c    x    bus       6       4
9: z fu    c    x  train      NA      NA

The data.table options I've come up with based on Fastest way to add rows for missing values in a data.frame? and Data.table: Add rows for missing combinations of 2 factors without losing associated descriptive factors end up expanding all unique combinations of var1 and var2, not just the combinations that already exist in the table. And I don't know how to keep the values of z and y. Like this:

setkey(df, var1, var2, transp)
x<-df[CJ(var1, var2, transp, unique=T)]

Maybe I should be using dplyr? Or maybe I'm missing something simple? I went through the data.table documentation and can't come up with a solution.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:27+0000

Here is a solution using dplyr and tidyr, in particular, tidyr::complete and tidyr::nesting. The latter is useful to complete using the combination in the dataset, whereas complete will give you all the combinations.

library(dplyr)
library(tidyr)
df %>% 
  filter(transp %in% transtype)  %>%
  complete(nesting(z, y, var1, var2), transp) %>%
  union(df)
# A tibble: 9 <U+00D7> 7
      z     y  var1  var2 transp sample1 sample2
  <chr> <chr> <chr> <chr>  <chr>   <chr>   <chr>
1     z    st     a     y  plane      10      10
2     z    st     a     y  train    <NA>    <NA>
3     z    st     a     y    bus       1       9
4     z    fu     c     x  train    <NA>    <NA>
5     z    fu     c     x    bus       5       3
6     z    co     b     z  train    <NA>    <NA>
7     z    co     b     z    bus       6       6
8     z    co     b     y  train       3       2
9     z    co     b     y    bus    <NA>    <NA>

Categories

r - Add missing rows within combinations of factors

r - Add missing rows within combinations of factors

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags