Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
393 views
in Technique[技术] by (71.8m points)

bioinformatics - How to recover alignment from edit distance in R?

I've been trying to implement my own solution to recover the alignment of two strings with edit distance. For instance, given the two strings "hitter" and "hotel", I'd like to recover the "h-otel" = "hitter" alignment, where "-" indicates an insertion, that underlies the calculation of the distance. My code so far however only gets to the cost matrix part of it, as shown below:

leven <- function(str1, str2){
  str1_len <- nchar(str1)
  str2_len <- nchar(str2)
  d <- matrix(nrow = nchar(str1)+1,
              ncol = nchar(str2)+1)
  d[,1] <- seq(from = 0, to = nchar(str1), by = 1)
  d[1,] <- seq(from = 0, to = nchar(str2), by = 1)
  for (j in 1:str2_len){
    for (i in 1:str1_len){
      if(substr(str1,i,i) == substr(str2,j,j)){
        cost <- 0
      }
      else{
        cost <- 1
      }
      d[i+1,j+1] <- min(d[i,j+1]+1,
                        d[i+1,j]+1,
                        d[i,j]+cost)
    }
  }
  d
}

leven("hotel", "hitter")

From what I understand, the key is to somehow retrace one's steps back to the beginning starting from the bottom-right corner of the matrix, but I'm a bit lost as to how in specific this should be done. Any help would be greatly appreciated; thanks in advance!

Edit 1: I've come to know that this might be called the Needleman-Wunsch algorithm, which is used to align DNA sequences. For instance, given the strings "GCATGCU" and "GATTACA", we can expect the output of "GCATG-CU" and "G-ATTACA", again with "-" as insertions, where the two strings are now aligned.

question from:https://stackoverflow.com/questions/66059512/how-to-recover-alignment-from-edit-distance-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...