Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
441 views
in Technique[技术] by (71.8m points)

c++ - Why is accumulate faster than a simple for cycle?

I was testing algorithms and run into this weird behavior, when std::accumulate is faster than a simple for cycle.

Looking at the generated assembler I'm not much wiser :-) It seems that the for cycle is optimized into MMX instructions, while accumulate expands into a loop.

This is the code. The behavior manifests with -O3 optimization level, gcc 4.7.1

#include <vector>                                                                                                                                                                                                                                                              
#include <chrono>                                                                                                                                                                                                                                                              
#include <iostream>                                                                                                                                                                                                                                                            
#include <random>                                                                                                                                                                                                                                                              
#include <algorithm>                                                                                                                                                                                                                                                           
using namespace std;                                                                                                                                                                                                                                                           

int main()                                                                                                                                                                                                                                                                     
{                                                                                                                                                                                                                                                                              
    const size_t vsize = 100*1000*1000;                                                                                                                                                                                                                                        

    vector<int> x;
    x.reserve(vsize);

    mt19937 rng;
    rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now()));

    uniform_int_distribution<uint32_t> dist(0,10);

    for (size_t i = 0; i < vsize; i++)
    {
        x.push_back(dist(rng));
    }

    long long tmp = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        tmp += x[i];
    }

    cout << "dry run " << tmp << endl;

    auto start = chrono::high_resolution_clock::now();
    long long suma = accumulate(x.begin(),x.end(),0);
    auto end = chrono::high_resolution_clock::now();

    cout << "Accumulate runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl;

    start = chrono::high_resolution_clock::now();
    suma = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        suma += x[i];
    }
    end = chrono::high_resolution_clock::now();

    cout << "Manual sum runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma <<  endl;

    return 0;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When you pass the 0 to accumulate, you are making it accumulate using an int instead of a long long.

If you code your manual loop like this, it will be equivalent:

int sumb = 0;
for (size_t i = 0; i < vsize; i++)
{
    sumb += x[i];
}
suma = sumb;

or you can call accumulate like this:

long long suma = accumulate(x.begin(),x.end(),0LL);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...