I am trying to measure the speedup in parallel section using one or four threads. As my parallel section is relatively simple, I expect a near-to-fourfold speedup. ( This is following my question:
openMp: severe perfomance loss when calling shared references of dynamic arrays )
As my parallel sections runs twice as fast on four cores compared to only one, I believe I have still not found the reason for the performance loss.
I want to parallelise my function iter as well as possible. The function is using entries of dynamic arrays and private quantities to change the entries of other dynamic arrays. Because every iteration step only uses the array entries of the respective loop step, I don't have different threads accessing the same array entry. Furthermore, I put some thought on false sharing, due to accessing entries in the same cache line. My guess is, that this is a minor effect, as my double-arrays are 5*10^5 long and by choosing a reasonable chunk size for the schedule(dynamic,chunk) command, I don't expect the very few entires in a given cache line to be accessed at the same time by different threads. In my simulation, I have about 80 of such arrays, so that allocating them on the stack is not comfortable and making private copies for every thread is out of question too.
Does anybody have an idea, how to improve this? I want to fully understand why this is so slow, before starting with compiler optimisations.
What also surprised me was: calling iter(parallel), with parallel = false, is slower than calling it with parallel = true and omp_set_num_threads(1).
main.cpp:
int main(){
mathClass m;
m.fillArrays();
double timeCount = 0.0;
for(int j = 0; j<1000; j++){
timeCount += m.iter(true);
}
printf("meam time difference = %fms
",timeCount);
return 0;
}
mathClass.h:
class mathClass{
private:
double* A;
double* B;
double* C;
int length;
public:
double* D;
mathClass();
double iter(bool parallel);
void fillArrays();
};
mathClass.cpp:
mathClass::mathClass(){
length = 5000000;
A = new double[length];
B = new double[length];
C = new double[length];
D = new double[length];
}
void mathClass::fillArrays(){
int temp;
for ( int i=0; i<length; i++){
temp = rand() % 100;
A[i] = double(temp);
temp = rand() % 100;
B[i] = double(temp);
temp = rand() % 100;
C[i] = double(temp);
}
}
double mathClass::iter(bool parallel){
double startTime;
double endTime;
omp_set_num_threads(4);
startTime = omp_get_wtime();
#pragma omp parallel if(parallel)
{
int alpha; // private in all threads
#pragma omp for schedule(static)
for (int i=0; i<length; i++){
alpha = 15*A[i];
D[i] = C[i]*alpha + B[i]*alpha*alpha;
}
}
endTime = omp_get_wtime();
return endTime - startTime;
}
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…