Welcome to the world of denormalized floating-point !
(欢迎来到非规范化浮点世界!)
They can wreak havoc on performance!!! (他们会对性能造成严重破坏!!!)
Denormal (or subnormal) numbers are kind of a hack to get some extra values very close to zero out of the floating point representation.
(非正规(或非正规)数字是一种破解,可以从浮点表示中获得非常接近于零的一些额外值。)
Operations on denormalized floating-point can be tens to hundreds of times slower than on normalized floating-point. (在非标准化浮点上的操作可能比在标准化浮点上的操作慢几十到数百倍 。)
This is because many processors can't handle them directly and must trap and resolve them using microcode. (这是因为许多处理器无法直接处理它们,而必须使用微码来捕获和解析它们。)
If you print out the numbers after 10,000 iterations, you will see that they have converged to different values depending on whether 0
or 0.1
is used.
(如果在10,000次迭代后打印出数字,您将看到它们已经收敛为不同的值,具体取决于使用0
还是0.1
。)
Here's the test code compiled on x64:
(这是在x64上编译的测试代码:)
int main() {
double start = omp_get_wtime();
const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6};
const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690};
float y[16];
for(int i=0;i<16;i++)
{
y[i]=x[i];
}
for(int j=0;j<9000000;j++)
{
for(int i=0;i<16;i++)
{
y[i]*=x[i];
y[i]/=z[i];
#ifdef FLOATING
y[i]=y[i]+0.1f;
y[i]=y[i]-0.1f;
#else
y[i]=y[i]+0;
y[i]=y[i]-0;
#endif
if (j > 10000)
cout << y[i] << " ";
}
if (j > 10000)
cout << endl;
}
double end = omp_get_wtime();
cout << end - start << endl;
system("pause");
return 0;
}
Output:
(输出:)
#define FLOATING
1.78814e-007 1.3411e-007 1.04308e-007 0 7.45058e-008 6.70552e-008 6.70552e-008 5.58794e-007 3.05474e-007 2.16067e-007 1.71363e-007 1.49012e-007 1.2666e-007 1.11759e-007 1.04308e-007 1.04308e-007
1.78814e-007 1.3411e-007 1.04308e-007 0 7.45058e-008 6.70552e-008 6.70552e-008 5.58794e-007 3.05474e-007 2.16067e-007 1.71363e-007 1.49012e-007 1.2666e-007 1.11759e-007 1.04308e-007 1.04308e-007
//#define FLOATING
6.30584e-044 3.92364e-044 3.08286e-044 0 1.82169e-044 1.54143e-044 2.10195e-044 2.46842e-029 7.56701e-044 4.06377e-044 3.92364e-044 3.22299e-044 3.08286e-044 2.66247e-044 2.66247e-044 2.24208e-044
6.30584e-044 3.92364e-044 3.08286e-044 0 1.82169e-044 1.54143e-044 2.10195e-044 2.45208e-029 7.56701e-044 4.06377e-044 3.92364e-044 3.22299e-044 3.08286e-044 2.66247e-044 2.66247e-044 2.24208e-044
Note how in the second run the numbers are very close to zero.
(请注意,在第二轮中,数字如何非常接近零。)
Denormalized numbers are generally rare and thus most processors don't try to handle them efficiently.
(非规范化的数字通常很少见,因此大多数处理器都不会尝试有效地处理它们。)
To demonstrate that this has everything to do with denormalized numbers, if we flush denormals to zero by adding this to the start of the code:
(为了证明这与非规格化数字有关,如果我们通过将非正规数添加到代码的开头将其冲洗为零 ,则可以:)
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
Then the version with 0
is no longer 10x slower and actually becomes faster.
(然后,具有0
的版本不再慢10倍,而实际上变得更快。)
(This requires that the code be compiled with SSE enabled.) ((这要求在启用SSE的情况下编译代码。))
This means that rather than using these weird lower precision almost-zero values, we just round to zero instead.
(这意味着我们不使用这些奇怪的较低精度的几乎为零的值,而是舍入为零。)
Timings: Core i7 920 @ 3.5 GHz:
(时间:Core i7 920 @ 3.5 GHz:)
// Don't flush denormals to zero.
0.1f: 0.564067
0 : 26.7669
// Flush denormals to zero.
0.1f: 0.587117
0 : 0.341406
In the end, this really has nothing to do with whether it's an integer or floating-point.
(最后,这确实与整数或浮点数无关。)
The 0
or 0.1f
is converted/stored into a register outside of both loops. (0
或0.1f
转换/存储到两个循环之外的寄存器中。)
So that has no effect on performance. (因此,这对性能没有影响。)