I experimented with calculating the mean of a list using Parallel.For()
. I decided against it as it is about four times slower than a simple serial version. Yet I am intrigued by the fact that it does not yield exactly the same result as the serial one and I thought it would be instructive to learn why.
My code is:
public static double Mean(this IList<double> list)
{
double sum = 0.0;
Parallel.For(0, list.Count, i => {
double initialSum;
double incrementedSum;
SpinWait spinWait = new SpinWait();
// Try incrementing the sum until the loop finds the initial sum unchanged so that it can safely replace it with the incremented one.
while (true) {
initialSum = sum;
incrementedSum = initialSum + list[i];
if (initialSum == Interlocked.CompareExchange(ref sum, incrementedSum, initialSum)) break;
spinWait.SpinOnce();
}
});
return sum / list.Count;
}
When I run the code on a random sequence of 2000000 points, I get results that are different in the last 2 digits to the serial mean.
I searched stackoverflow and found this: VB.NET running sum in nested loop inside Parallel.for Synclock loses information. My case, however, is different to the one described there. There a thread-local variable temp
is the cause of inaccuracy, but I use a single sum that is updated (I hope) according to the textbook Interlocked.CompareExchange()
pattern. The question is of course moot because of the poor performance (which surprises me, but I am aware of the overhead), yet I am curious whether there is something to be learnt from this case.
Your thoughts are appreciated.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…