cuda - Branch predication on GPU

Question

Welcome To Ask or Share your Answers For Others

cuda - Branch predication on GPU

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

cuda - Branch predication on GPU

I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches.

For example I have a code like this:

if (C)
 A
else
 B

so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then wait for memory, and so on? Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:58:04+0000

All of the CUDA capable architectures released so far operate like an SIMD machine. When there is branch divergence within a warp, both code paths are executed by all the threads in the warp, with the threads which are not following the active path executing the functional equivalent of a NOP (I think I recall that there is a conditional execution flag attached to each thread in a warp which allows non executing threads to be masked off).

So in your example, the 90 cycles answer is probably a better approximation of what really happens than the alternative.

Categories

cuda - Branch predication on GPU

cuda - Branch predication on GPU

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags