Let's call the two DNA strings longer
and shorter
. In order for shorter
to attach somewhere on longer
, a sequence of bases complementary to shorter
must be found somewhere in longer
, e.g. if there is ACGT
in shorter
, then you need to find TGCA
somewhere in longer
.
So, if you take shorter
and flip all of its bases to their complements:
char[] cs = shorter.toCharArray();
for (int i = 0; i < cs.length; ++i) {
// getComplement changes A->T, C->G, G->C, T->A,
// and throws an exception in all other cases
cs[i] = getComplement(cs[i]);
}
String shorterComplement = new String(cs);
For the examples given in your question, the complement of TTGCC
is AACGG
, and the complement of TGC
is ACG
.
Then all you have to do is to find shorterComplement
within longer
. You can do this trivially using indexOf
:
return longer.indexOf(shorterComplement);
Of course, if the point of the exercise is to learn how to do string matching, you can look at well-known algorithms for doing the equivalent of indexOf
. For instance, Wikipedia has a category for String matching algorithms.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…