Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
132 views
in Technique[技术] by (71.8m points)

r - Create new column based on 2 reference string columns

Problem

I have 2 dataframes, a reference dataframe ref_df and a test dataframe test_df. The reference dataframe was made to consist of 2 columns (strings): reference_A and reference_B for which I would like to create a new column, in my test_df dataframe to state that if both strings columns test_A and test_B match reference_A and reference_B, then "Pass", else "Fail".


Example Data

reference dataframe
ref_df <- data.frame(
  reference_A = c("ABC","HIJ","NOP","TUV"),
  reference_B = c("DEF","KLM","QRS","WXY")
)

ref_df

  reference_A reference_B
1         ABC         DEF
2         HIJ         KLM
3         NOP         QRS
4         TUV         WXY
test_df dataframe
test_df <- data.frame(
  sample = c(1,2,3,4,5,6),
  test_A = c("ABC","HII","NOP","TUV","TUS","KJF"),
  test_B = c("DEF","KLM","QRR","WXY","WXZ", "KLM")
)

test_df

  sample test_A test_B
1      1    ABC    DEF
2      2    HII    KLM
3      3    NOP    QRR
4      4    TUV    WXY
5      5    TUS    WXZ
6      6    KJF    KLM

Desired Solution

test_qc

  sample test_A test_B status
1      1    ABC    DEF Pass
2      2    HII    KLM Fail
3      3    NOP    QRR Fail
4      4    TUV    WXY Pass
5      5    TUS    WXZ Fail
6      6    KJF    KLM Fail

Failed Attempt

test_qc <- test_df %>% 
  select(test_A, test_B) %>% 
  mutate(status = 
           ifelse(test_A == ref_df$reference_A & test_B == ref_df$reference_B, 
                  "Pass", "Fail"))
Warning messages:
1: Problem with `mutate()` input `status`.
? longer object length is not a multiple of shorter object length
? Input `status` is `ifelse(...)`. 
2: In test_A == reference$reference_A :
  longer object length is not a multiple of shorter object length
3: Problem with `mutate()` input `status`.
? longer object length is not a multiple of shorter object length
? Input `status` is `ifelse(...)`. 
4: In test_B == reference$reference_B :
  longer object length is not a multiple of shorter object length
question from:https://stackoverflow.com/questions/65947124/create-new-column-based-on-2-reference-string-columns

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can try this:

library(dplyr)

ref_df$temp <- 1
 
test_df %>% left_join(ref_df, by =c("test_A" = "reference_A", "test_B" = "reference_B"))%>% mutate(status = if_else(is.na(temp), "Fail", "Pass")) %>% select(-temp)

  sample test_A test_B status
1      1    ABC    DEF   Pass
2      2    HII    KLM   Fail
3      3    NOP    QRR   Fail
4      4    TUV    WXY   Pass
5      5    TUS    WXZ   Fail
6      6    KJF    KLM   Fail


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...