Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
454 views
in Technique[技术] by (71.8m points)

c++ - 为什么将0.1f更改为0会使性能降低10倍?(Why does changing 0.1f to 0 slow down performance by 10x?)

Why does this bit of code,

(为什么这段代码,)

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0.1f; // <--
        y[i] = y[i] - 0.1f; // <--
    }
}

run more than 10 times faster than the following bit (identical except where noted)?

(比下面的位快10倍以上(相同的地方,除非特别说明)?)

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0; // <--
        y[i] = y[i] - 0; // <--
    }
}

when compiling with Visual Studio 2010 SP1.

(使用Visual Studio 2010 SP1进行编译时。)

The optimization level was -02 with sse2 enabled.

(启用sse2的优化级别为-02 。)

I haven't tested with other compilers.

(我没有与其他编译器一起测试过。)

  ask by Dragarro translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Welcome to the world of denormalized floating-point !

(欢迎来到非规范化浮点世界!)

They can wreak havoc on performance!!!

(他们会对性能造成严重破坏!!!)

Denormal (or subnormal) numbers are kind of a hack to get some extra values very close to zero out of the floating point representation.

(非正规(或非正规)数字是一种破解,可以从浮点表示中获得非常接近于零的一些额外值。)

Operations on denormalized floating-point can be tens to hundreds of times slower than on normalized floating-point.

(在非标准化浮点上的操作可能比在标准化浮点上的操作慢几十到数百倍 。)

This is because many processors can't handle them directly and must trap and resolve them using microcode.

(这是因为许多处理器无法直接处理它们,而必须使用微码来捕获和解析它们。)

If you print out the numbers after 10,000 iterations, you will see that they have converged to different values depending on whether 0 or 0.1 is used.

(如果在10,000次迭代后打印出数字,您将看到它们已经收敛为不同的值,具体取决于使用0还是0.1 。)

Here's the test code compiled on x64:

(这是在x64上编译的测试代码:)

int main() {

    double start = omp_get_wtime();

    const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6};
    const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690};
    float y[16];
    for(int i=0;i<16;i++)
    {
        y[i]=x[i];
    }
    for(int j=0;j<9000000;j++)
    {
        for(int i=0;i<16;i++)
        {
            y[i]*=x[i];
            y[i]/=z[i];
#ifdef FLOATING
            y[i]=y[i]+0.1f;
            y[i]=y[i]-0.1f;
#else
            y[i]=y[i]+0;
            y[i]=y[i]-0;
#endif

            if (j > 10000)
                cout << y[i] << "  ";
        }
        if (j > 10000)
            cout << endl;
    }

    double end = omp_get_wtime();
    cout << end - start << endl;

    system("pause");
    return 0;
}

Output:

(输出:)

#define FLOATING
1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007
1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007

//#define FLOATING
6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.46842e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044
6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.45208e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044

Note how in the second run the numbers are very close to zero.

(请注意,在第二轮中,数字如何非常接近零。)

Denormalized numbers are generally rare and thus most processors don't try to handle them efficiently.

(非规范化的数字通常很少见,因此大多数处理器都不会尝试有效地处理它们。)


To demonstrate that this has everything to do with denormalized numbers, if we flush denormals to zero by adding this to the start of the code:

(为了证明这与非规格化数字有关,如果我们通过将非正规数添加到代码的开头将其冲洗为零 ,则可以:)

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

Then the version with 0 is no longer 10x slower and actually becomes faster.

(然后,具有0的版本不再慢10倍,而实际上变得更快。)

(This requires that the code be compiled with SSE enabled.)

((这要求在启用SSE的情况下编译代码。))

This means that rather than using these weird lower precision almost-zero values, we just round to zero instead.

(这意味着我们不使用这些奇怪的较低精度的几乎为零的值,而是舍入为零。)

Timings: Core i7 920 @ 3.5 GHz:

(时间:Core i7 920 @ 3.5 GHz:)

//  Don't flush denormals to zero.
0.1f: 0.564067
0   : 26.7669

//  Flush denormals to zero.
0.1f: 0.587117
0   : 0.341406

In the end, this really has nothing to do with whether it's an integer or floating-point.

(最后,这确实与整数或浮点数无关。)

The 0 or 0.1f is converted/stored into a register outside of both loops.

(00.1f转换/存储到两个循环之外的寄存器中。)

So that has no effect on performance.

(因此,这对性能没有影响。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...