Apparently MSVC++2017 toolset v141 (x64 Release configuration) doesn't use FYL2X
x86_64 assembly instruction via a C/C++ intrinsic, but rather C++ log()
or log2()
usages result in a real call to a long function which seems to implement an approximation of logarithm (without using FYL2X
). The performance I measured is also strange: log()
(natural logarithm) is 1.7667 times faster than log2()
(base 2 logarithm), even though base 2 logarithm should be easier for the processor because it stores the exponent in binary format (and mantissa too), and that seems why the CPU instruction FYL2X
calculates base 2 logarithm (multiplied by a parameter).
Here is the code used for measurements:
#include <chrono>
#include <cmath>
#include <cstdio>
const int64_t cnLogs = 100 * 1000 * 1000;
void BenchmarkLog2() {
double sum = 0;
auto start = std::chrono::high_resolution_clock::now();
for(int64_t i=1; i<=cnLogs; i++) {
sum += std::log2(double(i));
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Log2: %.3lf Ops/sec calculated %.3lf
", cnLogs / nSec, sum);
}
void BenchmarkLn() {
double sum = 0;
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 1; i <= cnLogs; i++) {
sum += std::log(double(i));
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("Ln: %.3lf Ops/sec calculated %.3lf
", cnLogs / nSec, sum);
}
int main() {
BenchmarkLog2();
BenchmarkLn();
return 0;
}
The output for Ryzen 1800X is:
Log2: 95152910.728 Ops/sec calculated 2513272986.435
Ln: 168109607.464 Ops/sec calculated 1742068084.525
So to elucidate these phenomena (no usage of FYL2X
and strange performance difference), I would like to also test the performance of FYL2X
, and if it's faster, use it instead of <cmath>
's functions. MSVC++ doesn't allow inline assembly on x64, so an assembly file function that uses FYL2X
is needed.
Could you answer with the assembly code for such a function, that uses FYL2X
or a better instruction doing logarithm (without the need for specific base) if there is any on newer x86_64 processors?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…