I have written a function that finds the indices of subsequences in a long DNA sequence. It works when my longer DNA sequence is < about 4000 characters. However, when I try to apply the same function to a much longer sequence, the console gives me a + instead of a >... which leads me to believe that it is the length of the string that is the problem.
for example: when the longer sequence is: "GATATATGCATATACTT", and the subsequence is: "ATAT", I get the indices "1, 3, 9" (0-based)
dnaMatch <- function(dna, sequence) {
ret <- list()
k <- str_length(sequence)
c <- str_length(dna) - k
for(i in 1:(c+1)) {
ret[i] = str_sub(dna, i, i+k-1)
}
ret <- unlist(ret)
TFret <- lapply (ret, identical, sequence)
TFret <- which(unlist(TFret), arr.ind = TRUE) -1
print(TFret)
}
Basically, my question is... is there any way around the character-limitation in the string class?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…