Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
782 views
in Technique[技术] by (71.8m points)

cuda - How is the 2D thread blocks padded for warp scheduling?

I understand that for a 1D thread block with 31 threads, it will be padded to 32 threads for warp execution. What about a 2D block with 31*31 threads? Will warp scheduler pad 1 additional thread for each dimension (ie a total of 31 will be padded), or this 2D block threads will be concatenated and only the last thread will be padded (31*31=961; 961%32=1)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Only one warp (the last one) gets padded. Threads are grouped into warps in the order x, y, z. In this way, if you have an odd 2D array size, like 17x17, that is stored contiguously in memory, you can still create 32-thread warps out of a 17x17 thread block that will generate coalesced accesses. In this way, all of the warps will generate fully coalesced accesses except the last one. If individual warps were padded with dead threads along the way, it would be more wasteful in terms of memory accesses in this example.

For this example, at least, it works better from a machine utilization standpoint.

The documentational support for this rests on understanding that thread ID and thread index are not the same.

Thread index for a given thread is identified by the built-in variables threadIdx.x, threadIdx.y, and threadIdx.z. Thread ID is a unique (within the threadblock), scalar number assigned to each thread.

The relationship between thread ID and thread index is given by this statement:

"The index of a thread and its thread ID relate to each other in a straightforward way: For a one-dimensional block, they are the same; for a two-dimensional block of size (Dx, Dy),the thread ID of a thread of index (x, y) is (x + y Dx); for a three-dimensional block of size (Dx, Dy, Dz), the thread ID of a thread of index (x, y, z) is (x + y Dx + z Dx Dy). "

But the grouping of threads into warps is done explicitly by thread ID:

"The way a block is partitioned into warps is always the same; each warp contains threads of consecutive, increasing thread IDs with the first warp containing thread 0."

So based on the first statement, we see that even for an odd block shape like 17x17, there are no threads defined other than those which are within the dimensionality of the threadblock. Then based on the second statement, the consecutive assembly of warps by thread ID creates warps all of which have defined threads in them (except perhaps the last one.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...