You can use separate directives #pragma omp parallel
and #pragma omp for
.
#pragma omp parallel
creates parallel threads, whereas #pragma omp for
distributes the work between the threads. For sequential part of the outer loop you can use #pragma omp single
.
Here is an example:
int n = 3, m = 10;
#pragma omp parallel
{
for (int i = 0; i < n; i++){
#pragma omp single
{
printf("Outer loop part 1, thread num = %d
",
omp_get_thread_num());
}
#pragma omp for
for(int j = 0; j < m; j++) {
int thread_num = omp_get_thread_num();
printf("j = %d, Thread num = %d
", j, thread_num);
}
#pragma omp single
{
printf("Outer loop part 2, thread num = %d
",
omp_get_thread_num());
}
}
}
But I am not sure will it help you or not. To diagnose OpenMP performance issues, it would be better to use some profiler, such as Scalasca or VTune.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…