Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
356 views
in Technique[技术] by (71.8m points)

r - Fill missing combinations in a dataframe

My example dataset:

df <- data.frame(
 REGION = c("REGION A", "REGION A", "REGION B"), 
 CATEGORY = c("A", "B", "B"), 
 VALUE1 = c(2,3,4),
 VALUE2 = c(1,2,3)
)

Result:

  REGION    CATEGORY VALUE1 VALUE2
1 REGION A   A             2     1
2 REGION A   B             3     2
3 REGION B   B             4     3

Now I want that every combination of REGION and CATEGORY that is not considered in the data set is filled with a VALUE1 and VALUE2 of 0. The result of this df should be:

      REGION   CATEGORY VALUE1 VALUE2
    1 REGION A  A          2      1
    2 REGION A  B          3      2
    3 REGION B  A          4      3
    4 REGION B  B          0      0

I already wrote a big function for it, that generates a dynamic string with for-loops, but I have the feeling that there is a much simpler way to do it with only a few lines of code. I guess I am thinking much too complicated. Any ideas? Thank you in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Using complete from tidyr:

library(tidyr)
as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))

Output:

    REGION CATEGORY VALUE1 VALUE2
1 REGION A        A      2      1
2 REGION A        B      3      2
3 REGION B        A      0      0
4 REGION B        B      4      3

If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY)) and replace the NA's afterwards.

Hope this helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...