performance - Efficiency of branching in shaders

Question

Welcome To Ask or Share your Answers For Others

performance - Efficiency of branching in shaders

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance - Efficiency of branching in shaders

I understand that this question may seem somewhat ungrounded, but if someone knows anything theoretical / has practical experience on this topic, it would be great if you share it.

I am attempting to optimize one of my old shaders, which uses a lot of texture lookups.

I've got diffuse, normal, specular maps for each of three possible mapping planes and for some faces which are near to the user I also have to apply mapping techniques, which also bring a lot of texture lookups (like parallax occlusion mapping).

Profiling showed that texture lookups are the bottleneck of the shader and I am willing to remove some of them away. For some cases of the input parameters I already know that part of the texture lookups would be unnecessary and the obvious solution is to do something like (pseudocode):

if (part_actually_needed) {
   perform lookups;
   perform other steps specific for THIS PART;
}

// All other parts.

Now - here comes the question.

I do not remember exactly (that's why I stated the question might be ungrounded), but in some paper I recently read (unfortunately, can't remember the name) something similar to the following was stated:

The performance of the presented technique depends on how efficient the HARDWARE-BASED CONDITIONAL BRANCHING is implemented.

I remembered this kind of statement right before I was about to start refactoring a big number of shaders and implement that if-based optimization I was talking about.

So - right before I start doing that - does someone know something about the efficiency of the branching in shaders? Why could branching give a severe performance penalty in shaders?

And is it even possible that I could only worsen the actual performance with the if-based branching?

You might say - try and see. Yes, that's what I'm going to do if nobody here is helps me :)

But still, what in the if case may be effective for new GPU's could be a nightmare for a bit older ones. And that kind of issue is very hard to forecast unless you have a lot of different GPU's (that's not my case)

So, if anyone knows something about that or has benchmarking experience for these kinds of shaders, I would really appreciate your help.

Few remaining brain cells that are actually working keep telling me that branching on the GPU's might be far not as effective as branching for the CPU (which usually has extremely efficient ways of branch predictions and eliminating cache misses) simply because it's a GPU (or that could be hard / impossible to implement on the GPU).

Unfortunately I am not sure if this statement has anything in common with the real situation...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:45:54+0000

If the condition is uniform (i.e. constant for the entire pass), then the branch is essentially free because the framework will essentially compile two versions of the shader (branch taken and not) and choose one of these for the entire pass based on your input variable. In this case, definitely go for the if statement as it will make your shader faster.

If the condition varies per vertex/pixel, then it can indeed degrade performance and older shader models don't even support dynamic branching.

Categories

performance - Efficiency of branching in shaders

performance - Efficiency of branching in shaders

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags