r - match values in 2 columns with the corresponding position in another character column

Question

Welcome To Ask or Share your Answers For Others

r - match values in 2 columns with the corresponding position in another character column

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - match values in 2 columns with the corresponding position in another character column

An example dataframe:

example_df = data.frame(Gene.names = c("A", "B"),
                         Score = c("3.69,2.97,2.57,3.09,2.94",
                                   "3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83"),
                         ResidueAA = c("S", "Y"),
                         ResidueNo = c(3, 3),
                         Sequence = c("MSSYT", "MSSYTRAP") )

I want to check if the character at ResidueAA column at the position at ResidueNo column matches with the corresponding position in the ‘Sequence’ column. The output should be another column, say, ‘Check’ with a Yes or No.

This is working code:

example_df$Check=sapply(1:nrow(example_df),FUN=function(i){d=example_df[i,]; substr(d$Sequence,d$ResidueNo,d$ResidueNo)==d$ResidueAA})

Is there an easier/elegant way to do this? Ideally, I want something that works within a dplyr pipe. Also, related to this, how can I extract the corresponding value from the 'Score' column into a new column, say, 'Score_1'?

Thanks

question from:https://stackoverflow.com/questions/65908899/match-values-in-2-columns-with-the-corresponding-position-in-another-character-c

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:12:31+0000

We can use substr directly

library(dplyr)
example_df  %>%
   mutate(Check = substr(Sequence, ResidueNo, ResidueNo) == ResidueAA)

-output

#  Gene.names                                   Score ResidueAA ResidueNo Sequence Check
#1          A                3.69,2.97,2.57,3.09,2.94         S         3    MSSYT  TRUE
#2          B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83         Y         3 MSSYTRAP FALSE

To create a new column with matching 'Score', use match to get the corresponding index instead of == (which does an elementwise comparison) and use the index for extracting the 'Score' element

example_df  %>%
    mutate(Score2 =  Score[match(ResidueAA,
         substr(Sequence, ResidueNo, ResidueNo), ResidueAA)])

-output

#Gene.names                                   Score ResidueAA ResidueNo Sequence
#1          A                3.69,2.97,2.57,3.09,2.94         S         3    MSSYT
#2          B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83         Y         3 MSSYTRAP
#                    Score2
#1 3.69,2.97,2.57,3.09,2.94
#2                     <NA>

Update

Based on the comments, we need to extract the corresponding element of 'Score' based on the 'ResidueNo' if the substring values of 'Sequence' is the same as the 'ResidueAA'. This can be done by splitting the 'Score' with strsplit into a list, extract the first element ([[1]] - after a rowwise operation) and then use the 'ResidueNo' to get the splitted word on that location

example_df  %>%
  rowwise %>% 
  mutate(Score2 = if(substr(Sequence, ResidueNo, ResidueNo) == 
    ResidueAA) strsplit(Score, ",")[[1]][ResidueNo] else NA_character_) %>%
  ungroup

-output

# A tibble: 2 x 6
#  Gene.names Score                                   ResidueAA ResidueNo Sequence Score2
#  <chr>      <chr>                                   <chr>         <dbl> <chr>    <chr> 
#1 A          3.69,2.97,2.57,3.09,2.94                S                 3 MSSYT    2.57  
#2 B          3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83 Y                 3 MSSYTRAP <NA>

Or another option is separate_rows to split the rows to expand the data, then do a group by 'Gene.names', `summarise to get the corresponding 'Score2' element (similar to previous solution) and do a join with the original dataset

library(tidyr)
example_df %>%
    separate_rows(Score, sep= ",") %>% 
    group_by(Gene.names) %>% 
    summarise(Score2 = if(substr(first(Sequence), first(ResidueNo), first(ResidueNo)) ==
       first(ResidueAA)) Score[first(ResidueNo)] else
         NA_character_, .groups = 'drop') %>% 
    right_join(example_df)

Categories

r - match values in 2 columns with the corresponding position in another character column

r - match values in 2 columns with the corresponding position in another character column

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Update

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags