There's no AND/OR going on, unless you need to unpack the 8bit integer holding four 2bit indices.
Make your own definition for _MM_SHUFFLE
that expands to four args, instead of packing them.
It's something like
// dst = _mm_shuffle_epi32(src, _MM_SHUFFLE(d,c,b,a))
void pshufd(int dst[4], int src[4], int d,int c,int b,int a)
{ // note that the _MM_SHUFFLE args are high-element-first order
dst[0] = src[a];
dst[1] = src[b];
dst[2] = src[c];
dst[3] = src[d];
}
Vectors are indexed from low element = 0. The low element is the one that stores into memory at the lowest address, but when values are in registers you should think about them as [ 3 2 1 0 ]
. In this notation, vector right-shifts (like psrldq
) actually shift to the right.
This is why _mm_set_epi32(3, 2, 1, 0)
takes its args in reverse order from int foo[] = { 0, 1, 2, 3 };
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…