Using simple brute force is sometimes good.
I think precalc all shifted values of the word and put them in 16 ints
so you got an array like this (assuming int
is twice as wide as short
)
unsigned short pattern = 1234;
unsigned int preShifts[16];
unsigned int masks[16];
int i;
for(i=0; i<16; i++)
{
preShifts[i] = (unsigned int)(pattern<<i); //gets promoted to int
masks[i] = (unsigned int) (0xffff<<i);
}
and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.
So basically like this:
int numMatch(unsigned short curWord, unsigned short prevWord)
{
int numHits = 0;
int combinedWords = (prevWord<<16) + curWord;
int i=0;
for(i=0; i<16; i++)
{
if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
}
return numHits;
}
Do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:
e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!
The time cost of this, assuming it compiles approximately as written, is 16 checks per input word. Per input bit, this does one &
and ==
, and branch or other conditional increment. And also a table lookup for the mask for every bit.
The table lookup is unnecessary; by instead right-shifting combined
we get significantly more efficient asm, as shown in another answer which also shows how to vectorize this with SIMD on x86.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…