For example, assume we have two threads, thread A that only reads from a shared variable and thread B that only writes to a shared variable. Proper synchronisation by e.g. pthreads mutexes is enforced.
IIUC, without the volatile keyword, the compiler may look at the code of thread A and say: “The variable doesn’t appear to be modified here, but we have lots of reads; let’s read it only once, cache the value and optimise away all subsequent reads.” Also it may look at the code of thread B and say: “We have lots of writes to this variable here, but no reads; so, the written values are not needed and thus let’s optimise away all writes.“
Like most thread synchronization primitives, pthreads mutex operations have explicitly defined memory visibility semantics.
Either the platform supports pthreads or it doesn't. If it supports pthreads, it supports pthreads mutexes. Either those optimizations are safe or they aren't. If they're safe, there's no problem. If they're unsafe, then any platform that makes them doesn't support pthreads mutexes.
For example, you say "The variable doesn’t appear to be modified here", but it does -- another thread could modify it there. Unless the compiler can prove its optimization can't break any conforming program, it can't make it. And a conforming program can modify the variable in another thread. Either the compiler supports POSIX threads or it doesn't.
As it happens, most of this happens automatically on most platforms. The compiler is just prevented from having any idea what the mutex operations do internally. Anything another thread could do, the mutex operations themselves could do. So the compiler has to "synchronize" memory before entering and exiting those functions anyway. It can't, for example, keep a value in a register across the call to pthread_mutex_lock
because for all it knows, pthread_mutex_lock
accesses that value in memory. Alternatively, if the compiler has special knowledge about the mutex functions, that would include knowing about the invalidity of caching values accessible to other threads across those calls.
A platform that requires volatile
would be pretty much unusable. You'd need versions of every function or class for the specific cases where an object might be made visible to, or was made visible from, another thread. In many cases, you'd pretty much just have to make everything volatile
and not caching values in registers is a performance non-starter.
As you've probably heard many times, volatile
's semantics as specified in the C language just do not mix usefully with threads. Not only is it not sufficient, it disables many perfectly safe and nearly essential optimizations.