c++ - Generate a bitmap/bitset with cuda(thrust), random memory access makes the code less efficient

Question

Welcome To Ask or Share your Answers For Others

c++ - Generate a bitmap/bitset with cuda(thrust), random memory access makes the code less efficient

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Generate a bitmap/bitset with cuda(thrust), random memory access makes the code less efficient

I'm new to cuda and thrust, what I'm doing is using cuda (specifically thrust) to implement a 1-dimensional bitmap or bitset mapping of a billion int numbers in a set( unduplicated numbers greater than 0), mapping each number to a binary bit, the bitmap is implemented with a unsigned int array, I get the index of the array by performing /32 calculation, and get the position of the binary bits by %32 operation.

My question is, the elements in my int-number set are disordered, so random global memory accesses make my code less efficient when using the for_each function in thrust, if the array is sorted, the execution time will be more than 10 times faster. Is there any way to optimize my method or cuda's random memory accesses? Thanks in advance!

The main code is as follows, the d_set's type is device_vector, bitmap's type is device_vector and initialized by fill with zeros

struct map_functor
{
    unsigned int* d_ptr;
    map_functor(unsigned int * a) : d_ptr(a) {}

    __device__
        void operator()(int x)
    {
        // note that using printf in a __device__ function requires
        // code compiled for a GPU with compute capability 2.0 or
        // higher (nvcc --arch=sm_20)
        int i = x / 32; //index
        int j = x % 32;  //position
        atomicOr(&d_ptr[i], (1 << j));
    }
};

thrust::for_each(d_set.begin(), d_set.end(), map_functor(thrust::raw_pointer_cast(bitmap.data())));//for_each 测试

Here is the comparison of execution time before sorting and after sorting.

Before:

After:

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

c++ - Generate a bitmap/bitset with cuda(thrust), random memory access makes the code less efficient

c++ - Generate a bitmap/bitset with cuda(thrust), random memory access makes the code less efficient

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags