nsight - CUDA caching global memory into shared memory

Question

Welcome To Ask or Share your Answers For Others

nsight - CUDA caching global memory into shared memory

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

nsight - CUDA caching global memory into shared memory

About my CUDA kernel:

I have data, residing in global memory that is read by all threads, and it is passed in as a kernel function argument.

NSIGHT identified it as a source of bandwidth use.

Every thread reads every value, and in the same sequential order as all other threads.

To avoid 32 reads from global memory for every value, I decided to fetch a copy of the data into shared memory.

It looks something like this:

extern __shared__ __half aab[];
...
__global__ mykernel
(
  int numbox,
  ...
  const value_t* glob_aab,
  ...
)
{
        // Each thread does a portion of the data.
        const int copysz = numbox*6;
        for ( int i=threadIdx.x; i<copysz; i+=blockDim.x )
                aab[i] = glob_aab[i];
        __syncthreads();
        // All threads are done cacheing the aab geometry.
        ...
}

So far, so good. It works.

However, a peculiar thing is discovered when I profile this kernel with the Nvidia NSIGHT Compute tool.

Those 18,508 non-issued instructions are due to a math-pipe throttle.

I am guessing the i+=blockDim.x statement?

Anyways... what is the correct way to copy read-only data from global memory into a shared (cached) block?

I use the steps of blockDim.x so that the reads (and writes to cache) are coalesced. But if it gets throttled by the math pipeline, should I use another method to fill my cached copy?

OS: Ubuntu 20.10

GPU: RTX 3070

CUDA: 11.2

question from:https://stackoverflow.com/questions/65650050/cuda-caching-global-memory-into-shared-memory

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

nsight - CUDA caching global memory into shared memory

nsight - CUDA caching global memory into shared memory

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags