We can use substr
directly
library(dplyr)
example_df %>%
mutate(Check = substr(Sequence, ResidueNo, ResidueNo) == ResidueAA)
-output
# Gene.names Score ResidueAA ResidueNo Sequence Check
#1 A 3.69,2.97,2.57,3.09,2.94 S 3 MSSYT TRUE
#2 B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83 Y 3 MSSYTRAP FALSE
To create a new column with match
ing 'Score', use match
to get the corresponding index instead of ==
(which does an elementwise comparison) and use the index for extracting the 'Score' element
example_df %>%
mutate(Score2 = Score[match(ResidueAA,
substr(Sequence, ResidueNo, ResidueNo), ResidueAA)])
-output
#Gene.names Score ResidueAA ResidueNo Sequence
#1 A 3.69,2.97,2.57,3.09,2.94 S 3 MSSYT
#2 B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83 Y 3 MSSYTRAP
# Score2
#1 3.69,2.97,2.57,3.09,2.94
#2 <NA>
Update
Based on the comments, we need to extract the corresponding element of 'Score' based on the 'ResidueNo' if
the substr
ing values of 'Sequence' is the same as the 'ResidueAA'. This can be done by splitting the 'Score' with strsplit
into a list
, extract the first element ([[1]]
- after a rowwise
operation) and then use the 'ResidueNo' to get the splitted word on that location
example_df %>%
rowwise %>%
mutate(Score2 = if(substr(Sequence, ResidueNo, ResidueNo) ==
ResidueAA) strsplit(Score, ",")[[1]][ResidueNo] else NA_character_) %>%
ungroup
-output
# A tibble: 2 x 6
# Gene.names Score ResidueAA ResidueNo Sequence Score2
# <chr> <chr> <chr> <dbl> <chr> <chr>
#1 A 3.69,2.97,2.57,3.09,2.94 S 3 MSSYT 2.57
#2 B 3.99,2.27,2.89,2.89,2.00,2.52,2.09,2.83 Y 3 MSSYTRAP <NA>
Or another option is separate_rows
to split the rows to expand the data, then do a group by 'Gene.names', `summarise to get the corresponding 'Score2' element (similar to previous solution) and do a join with the original dataset
library(tidyr)
example_df %>%
separate_rows(Score, sep= ",") %>%
group_by(Gene.names) %>%
summarise(Score2 = if(substr(first(Sequence), first(ResidueNo), first(ResidueNo)) ==
first(ResidueAA)) Score[first(ResidueNo)] else
NA_character_, .groups = 'drop') %>%
right_join(example_df)