Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.3k views
in Technique[技术] by (71.8m points)

dplyr - Check if all the elements in the Vector are available in the groups in R data frame

I am having a data frame in R as follows:

df <- data.frame("location" = c("IND","IND","IND","US","US","US"), type = c("butter","milk","cheese","milk","cheese","yogurt"), quantity = c(2,3,4,5,6,7))

I am having a vector as follows:

typeVector <- c("butter","milk","cheese","yogurt")

I need to check if all the 4 types mentioned in the vector are available in the data frame for each group based on the location. If any of the types are missing in a group, I need to add a row with the missing element and the corresponding location with the quantity as 0 in the data frame.

This is my expected output

dfOutput <- data.frame("location" = c("IND","IND","IND","IND","US","US","US","US"), type = c("butter","milk","cheese","yogurt","butter","milk","cheese","yogurt"), quantity = c(2,3,4,0,0,5,6,7))

How can I achieve this in R using dplyr package?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
library(dplyr)
distinct(df, location) %>%
  tidyr::crossing(type = typeVector) %>%
  full_join(df, ., by = c("location", "type")) %>%
  ungroup() %>%
  mutate(quantity = coalesce(quantity, 0))
#   location   type quantity
# 1      IND butter        2
# 2      IND   milk        3
# 3      IND cheese        4
# 4       US   milk        5
# 5       US cheese        6
# 6       US yogurt        7
# 7      IND yogurt        0
# 8       US butter        0

Steps:

  1. Create a temporary frame that is an expansion of location with your types in typeVector;

    distinct(df, location) %>%
      crossing(type = typeVector)
    # # A tibble: 8 x 2
    #   location type  
    #   <chr>    <chr> 
    # 1 IND      butter
    # 2 IND      cheese
    # 3 IND      milk  
    # 4 IND      yogurt
    # 5 US       butter
    # 6 US       cheese
    # 7 US       milk  
    # 8 US       yogurt
    
  2. Join this back onto the original data, which will produce NAs in the new rows

    ... %>%
      full_join(df, ., by = c("location", "type"))
    #   location   type quantity
    # 1      IND butter        2
    # 2      IND   milk        3
    # 3      IND cheese        4
    # 4       US   milk        5
    # 5       US cheese        6
    # 6       US yogurt        7
    # 7      IND yogurt       NA
    # 8       US butter       NA
    
  3. Change these new fields from NA to 0 with the mutate. (Note: if you have previously-existing NA and want to keep them that way, then this process needs to be adjusted.)

  4. I tend to ungroup all grouped processes when done. This is not necessary for this task, but if you forget it's grouped and do some future work on it, it is possible that you will get different results, or at least it will be slightly less efficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...