It depends on what you can assume about the minimum level of SSE support that you have.
Going all the way back to SSE2 you have _mm_extract_epi16
(PEXTRW
) which can be used to extract any 16 bit element from a 128 bit vector. You would need to call this twice to get the two halves of a 32 bit element.
In more recent versions of SSE (SSE4.1 and later) you have _mm_extract_epi32
(PEXTRD
) which can extract a 32 bit element in one instruction.
Alternatively if this is not inside a performance-critical loop you can just use a union, e.g.
typedef union
{
__m128i v;
int32_t a[4];
} U32;
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…