Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
367 views
in Technique[技术] by (71.8m points)

r - Using ifelse() to replace NAs in one data frame by referencing another data frame of different length

I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:

1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another

With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.

I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".

Here's the first data frame (df1):

> df1
    B          C  A
1  NA 2012-10-01  0
2  NA 2012-10-01  5
3   4 2012-10-01 10
4  NA 2012-10-01 15
5  NA 2012-10-01 20
6  20 2012-10-01 25
7  NA 2012-10-01  0
8  NA 2012-10-01  5
9   5 2012-10-01 10
10  5 2012-10-01 15

> str(df1)
'data.frame':   10 obs. of  3 variables:
 $ B: num  NA NA 4 NA NA 20 NA NA 5 5
 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1
 $ A: num  0 5 10 15 20 25 0 5 10 15

And the second data frame (df2).

> df2
   A         B
1  0 1.7169811
2  5 0.3396226
3 10 0.1320755
4 15 0.1509434
5 20 0.0754717
6 25 2.0943396

> str(df2)
'data.frame':   6 obs. of  2 variables:
 $ A: int  0 5 10 15 20 25
 $ B: num  1.717 0.3396 0.1321 0.1509 0.0755 ...

I think I'm pretty close with the following code:

> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B)
 [1]  1.7169811  0.3396226  4.0000000  0.1509434  0.0754717 20.0000000         NA         NA
 [9]  5.0000000  5.0000000
Warning message:
In df2$A == df1$A :
  longer object length is not a multiple of shorter object length

Obviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .

Thanks, in advance, for any help, and, once again, thanks for your patience!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:

> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B)   
#                         Switched '==' to '%in%' ---^
> df1
            B          C  A
1   1.7169811 2012-10-01  0
2   0.3396226 2012-10-01  5
3   4.0000000 2012-10-01 10
4   0.1509434 2012-10-01 15
5   0.0754717 2012-10-01 20
6  20.0000000 2012-10-01 25
7   1.7169811 2012-10-01  0
8   0.3396226 2012-10-01  5
9   5.0000000 2012-10-01 10
10  5.0000000 2012-10-01 15

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...