About the magic numbers Paul's answer is pretty clear.
As far as the code is concerned, Rabin Karp's principal idea is to perform an hash comparison between a sliding portion of the string and the pattern.
The hash cannot be computed each time on the whole substrings, otherwise the computation complexity would be quadratic O(n^2)
instead of linear O(n)
.
Therefore, a rolling hash function is applied, such as at each iteration only one character is needed to update the hash value of the substring.
So, let's comment your code:
for (int i = 0; i < B.Length; i++)
{
siga = (siga * D + (ulong)A[i]) % Q;
sigb = (sigb * D + (ulong)B[i]) % Q;
}
if (siga == sigb)
{
Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
return;
}
^
This piece computes the hash of pattern B
(sigb
), and the hashcode of the initial substring of A
of the same length of B
.
Actually it's not completely correct because hash can collide1 and so, it is necessary to modify the if statement : if (siga == sigb && A.Substring(0, B.Length) == B)
.
ulong pow = 1;
for (int k = 1; k <= B.Length - 1; k++)
pow = (pow * D) % Q;
^
Here's computed pow
that is necessary to perform the rolling hash.
for (int j = 1; j <= A.Length - B.Length; j++)
{
siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
if (siga == sigb)
{
if (A.Substring(j, B.Length) == B)
{
Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
A.Substring(j, B.Length),
A.Substring(j + B.Length)));
return;
}
}
}
^
Finally, the remaining string (i.e. from the second character to end), is scanned updating the hash value of the A substring and compared with the hash of B (computed at the beginning).
If the two hashes are equal, the substring and the pattern are compared1 and if they're actually equal a message is returned.
1 Hash values can collide; hence, if two strings have different hash values they're definitely different, but if the two hashes are equal they can be equal or not.