Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

r - Ifelse to Compare the content of columns with dplyr

I have a df where I want to add a column depending on the contents of other columns. I have it working for one sample with:

df <- df %>% 
  mutate(SB013 = if_else(sample_id == "SB013", IN, OUT))

Where I add a column called SB013, if the row in sample_id column == "SB013" then add the contents of column IN else add the contents of column OUT. I now need to expand this to add more columns for the number of sample_ids there are. So:

   sample_names<- unique(df$sample_id)
   sample_names

[1] "SB013" "SB014" "SB015" "SB016"

I have tried a couple of for loops:

for (name in sample_names) {
   comb.variant <- comb.variant %>% 
   mutate(name = if_else(sample_id == name, Alt,Ref))
  
}

Where only one column called "name" is added. And:

for (i in 1:length(sample_names)) {
  #print(sample_names[i])
  comb.variant <- comb.variant %>%
    mutate(sample_names[i] = if_else(sample_id == sample_names[i], Alt,Ref))
}
Error: unexpected '=' in:
"  comb.variant <- comb.variant %>%
    mutate(sample_names[i] ="

How do I make this work??

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The best way to do this would be to use the glue package. This allows you to easily assign variable names based on strings:

library(tidyverse)
library(glue)

> df <- data.frame(
    sample_id = c('SB013', 'SB014', 'SB013', 'SB014', 'SB015', 'SB016', 'SB016'), 
    IN = c(1,2,3,4,5,6,7), 
    OUT = c(rep('out',7)))

> df
  sample_id IN OUT
1     SB013  1 out
2     SB014  2 out
3     SB013  3 out
4     SB014  4 out
5     SB015  5 out
6     SB016  6 out
7     SB016  7 out

> df %>% mutate(`SB013` = ifelse(sample_id == 'SB013', IN, OUT))

  sample_id IN OUT SB013
1     SB013  1 out     1
2     SB014  2 out   out
3     SB013  3 out     3
4     SB014  4 out   out
5     SB015  5 out   out
6     SB016  6 out   out
7     SB016  7 out   out

Note that in this case, the dplyr command if_else won't work because IN and OUT are of different types. If it is a large dataset, I would suggest using a function with vectorisation (map or lapply), but if there are only 4 unique sample_id:

> list_of_ids <- 
    df %>% 
    distinct(sample_id) %>%
    unlist() %>%
    unname()

> list_of_ids
[1] "SB013" "SB014" "SB015" "SB016"


> for(current_id in list_of_ids){
df <- 
    df %>% 
    mutate('{current_id}' := ifelse(sample_id == current_id, IN, OUT))
}

> df
  sample_id IN OUT SB013 SB014 SB015 SB016
1     SB013  1 out     1   out   out   out
2     SB014  2 out   out     2   out   out
3     SB013  3 out     3   out   out   out
4     SB014  4 out   out     4   out   out
5     SB015  5 out   out   out     5   out
6     SB016  6 out   out   out   out     6
7     SB016  7 out   out   out   out     7

In future, please add sample data and expected output to your question.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...