The flush directive tells the OpenMP compiler to generate code to make the thread's private view on the shared memory consistent again. OpenMP usually handles this pretty well and does the right thing for typical programs. Hence, there's no need for flush
.
However, there are cases where the OpenMP compiler needs some help. One of these cases is when you try to implement your own spin lock. In these cases, you would need a combination of flushes to make things work, since otherwise the spin variables will not be updated. Getting the sequence of flushes correct will be tough and will be very, very error prone.
The general recommendation is that flushes should not be used. If at all, programmers should avoid flush with a list (flush(var,...)
) at all means. Some folks are actually talking about deprecating it in future OpenMP.
Performance-wise the impact of flush should be more negative than positive. Since it causes the compiler to generate memory fences and additional load/store operations, I would expect it to slow down things.
EDIT: For your second question, the answer is no. OpenMP makes sure that each thread has a consistent view on the shared memory when it needs to. If threads do not synchronize, they do not need to update their view on the shared memory, because they do not see any "interesting" change there. That means that any read a thread makes does not read any data that has been changed by some other thread. If that would be the case, then you'd have a race condition and a potential bug in your program. To avoid the race, you need to synchronize (which then implies a flush to make each participating thread's view consistent again). A similar argument applies to barriers. You use barriers to start a new epoch in the computation of a parallel region. Since you're keeping the threads in lock-step, you will very likely also have some shared state between the threads that has been computed in the previous epoch.
BTW, OpenMP may keep private data for a thread, but it does not have to. So, it is likely that the OpenMP compiler will keep variables in registers for a while, which causes them to be out of sync with the shared memory. However, updates to array elements are typically reflected pretty soon in the shared memory, since the amount of private storage for a thread is usually small (register sets, caches, scratch memory, etc.). OpenMP only gives you some weak restrictions on what you can expect. An actual OpenMP implementation (or the hardware) may be as strict as it wishes to be (e.g., write back any change immediately and to flushes all the time).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…