I have a function: f(param)
, which does a single calculation based on input (param
). This function supposed to be called something around 1 million (maximum) in a tight loop:
for (std::uint_fast64_t i = 0; i < 1'000'000; ++i) {
f (param);
}
First Question:
My (first) question is what is the most effective way to write the condition section (to what to do based on param
) of the f()
function. I tried some options that I know:
if-else if
:
static int f_3(int const param) {
if (A::_1 == param) {
return A::_1 * param;
} else if (A::_2 == param) {
return A::_2 * param;
} // ...
}
switch-case
:
static int f(int const param) {
switch (param) {
case A::_1:
return param * A::_1;
// ...
}
}
goto-label
:
static int f_2(int const param) {
void constexpr* const _table[] = {
&& L1, // ...
};
goto* _table[param];
L1:
return A::_1 * param;
// ...
}
And I benchmarked it:
- The complete code.
- Compiler: g++ (GCC) 10.2.0
- Compiler options: -O3 -std=c++17 -lboost_timer
- OS: ArchLinux 5.10.8-arch1-1
int main(int argc, char* argv[]) {
using ufast_t = std::uint_fast64_t;
ufast_t constexpr n = 1'000'000'000ULL;
ufast_t sum = 0;
{
std::cout << "switch-case :";
boost::timer::auto_cpu_timer _f;
for (ufast_t i = 0; i < n; ++i) {
sum += f(argc);
}
}
std::cout << "---------------------------" << std::endl;
{
std::cout << "goto :";
boost::timer::auto_cpu_timer _f2;
for (ufast_t i = 0; i < n; ++i) {
sum += f_2(argc);
}
}
std::cout << "---------------------------" << std::endl;
{
std::cout << "if-elseif :";
boost::timer::auto_cpu_timer _f3;
for (ufast_t i = 0; i < n; ++i) {
sum += f_3(argc);
}
}
std::cout << std::endl;
return sum;
}
But it has (i think) a strange output:
> ./a.out 1 2 3 4 5 6
switch-case : 1.056976s wall, 1.050000s user + 0.000000s system = 1.050000s CPU (99.3%)
---------------------------
goto : 0.000001s wall, 0.000000s user + 0.000000s system = 0.000000s CPU (n/a%)
---------------------------
if-elseif : 0.645751s wall, 0.640000s user + 0.000000s system = 0.640000s CPU (99.1%)
> echo $?
0
So is there any (effective) way to do that? or I just have those options in my hand and I have to select one of them based on some benchmarks?
Second Question:
Why goto
in the above benchmark doesn't have any overhead? It's Undefined behavior code? or somehow the compiler removes their loops in the optimization stage?
question from:
https://stackoverflow.com/questions/65844193/which-is-faster-in-a-tight-loop-swich-cast-if-else-goto-label 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…