We need to combine 3 columns in a database by concatenation. However, the 3 columns may contain overlapping parts and the parts should not be duplicated. For example,
"a" + "b" + "c" => "abc"
"abcde" + "defgh" + "ghlmn" => "abcdefghlmn"
"abcdede" + "dedefgh" + "" => "abcdedefgh"
"abcde" + "d" + "ghlmn" => "abcdedghlmn"
"abcdef" + "" + "defghl" => "abcdefghl"
Our current algorithm is pretty slow because it uses brute-force to identify the overlapping part between 2 strings. Does any one know an efficient algorithm to do this?
Say we have 2 strings A and B. The algorithm needs to find the longest common substring S so that A ends with S and B starts with S.
Our current brute-force implementation in Java is attached for reference,
public static String concat(String s1, String s2) {
if (s1 == null)
return s2;
if (s2 == null)
return s1;
int len = Math.min(s1.length(), s2.length());
// Find the index for the end of overlapping part
int index = -1;
for (int i = len; i > 0; i--) {
String substring = s2.substring(0, i);
if (s1.endsWith(substring)) {
index = i;
break;
}
}
StringBuilder sb = new StringBuilder(s1);
if (index < 0)
sb.append(s2);
else if (index <= s2.length())
sb.append(s2.substring(index));
return sb.toString();
}
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…