x86 - SSE: Difference between _mm_load/store vs. using direct pointer access

Question

Welcome To Ask or Share your Answers For Others

x86 - SSE: Difference between _mm_load/store vs. using direct pointer access

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

x86 - SSE: Difference between _mm_load/store vs. using direct pointer access

Suppose I want to add two buffers and store the result. Both buffers are already allocated 16byte aligned. I found two examples how to do that.

The first one is using _mm_load to read the data from the buffer into an SSE register, does the add operation and stores back to the result register. Until now I would have done it like that.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    __m128i _s = _mm_load_si128( (__m128i*) src );
    __m128i _d = _mm_load_si128( (__m128i*) dst );

    _d = _mm_add_epi16( _d, _s );

    _mm_store_si128( (__m128i*) dst, _d );
  }
}

The second example just did the add operations directly on the memory addresses without load/store operation. Both seam to work fine.

void _add( uint16_t * dst, uint16_t const * src, size_t n )
{
  for( uint16_t const * end( dst + n ); dst != end; dst+=8, src+=8 )
  {
    *(__m128i*) dst = _mm_add_epi16( *(__m128i*) dst, *(__m128i*) src );
  }
}

So the question is if the 2nd example is correct or may have any side effects and when to use load/store is mandatory.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:16:29+0000

Both versions are fine - if you look at the generated code you will see that the second version still generates at least one load to a vector register, since PADDW (aka _mm_add_epi16) can only get its second argument directly from memory.

In practice most non-trivial SIMD code will do a lot more operations between loading and storing data than just a single add, so in general you probably want to load data initially to vector variables (registers) using _mm_load_XXX, perform all your SIMD operations on registers, then store the results back to memory via _mm_store_XXX.

Categories

x86 - SSE: Difference between _mm_load/store vs. using direct pointer access

x86 - SSE: Difference between _mm_load/store vs. using direct pointer access

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags