atomic_thread_fence(memory_order_seq_cst)
always generates a full-barrier.
- x86_64:
MFENCE
- PowerPC:
hwsync
- Itanuim:
mf
- ARMv7 / ARMv8:
dmb ish
- MIPS64:
sync
The main thing: observing thread can simply observe in a different order, and will not matter what fences you are using in the observed thread.
Is it allowed by a optimizing compiler to reorder instruction (3) to
before (1)?
Not, it isn't allowed. But in globally visible for multithreading programm this is true, only if:
- other threads use the same
memory_order_seq_cst
for atomically read/write-operations with these values
- or if other threads use the same
atomic_thread_fence(memory_order_seq_cst);
between load() and store() too - but this approach doesn't guarantee sequential consistency in general, because sequential consistency is more strong guarantee
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§ 29.3 / 8
[ Note: memory_order_seq_cst ensures sequential consistency only for a
program that is free of data races and uses exclusively
memory_order_seq_cst operations. Any use of weaker ordering will
invalidate this guarantee unless extreme care is used. In particular,
memory_order_seq_cst fences ensure a total order only for the fences
themselves. Fences cannot, in general, be used to restore sequential
consistency for atomic operations with weaker ordering specifications.
— end note ]
How it can be mapped to assembler:
Case-1:
atomic<int> x, y
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
This code isn't always equivalent to the meaning of Case-2, but this code produce the same instructions between STORE & LOAD, as well as if both LOAD and STORE uses memory_order_seq_cst
- this is Sequential Consistency which prevents StoreLoad-reordering, Case-2:
atomic<int> x, y;
y.store(1, memory_order_seq_cst); //(1)
x.load(memory_order_seq_cst); //(3)
With some notes:
- it may add duplicate instructions (as in the following example for MIPS64)
or may use similar operations in the form of other instructions:
Guide for ARMv8-A
Table 13.1. Barrier parameters
ISH
Any - Any
Any - Any This means that both loads and stores must complete before
the barrier. Both loads and stores that appear after the barrier in
program order must wait for the barrier to complete.
Prevent reordering of two instructions can be done by additional instructions between these two. And as we see the first STORE(seq_cst) and next LOAD(seq_cst) generate instructions between its are the same as FENCE(seq_cst) (atomic_thread_fence(memory_order_seq_cst)
)
Mapping of C/C++11 memory_order_seq_cst
to differenct CPU architectures for: load()
, store()
, atomic_thread_fence()
:
Note atomic_thread_fence(memory_order_seq_cst);
always generates Full-barrier:
x86_64: STORE-MOV (into memory),
MFENCE
, LOAD-MOV (from memory)
, fence-MFENCE
x86_64-alt: STORE-MOV (into memory)
, LOAD-MFENCE
,MOV (from memory)
, fence-MFENCE
x86_64-alt3: STORE-(LOCK) XCHG
, LOAD-MOV (from memory)
, fence-MFENCE
- full barrier
x86_64-alt4: STORE-MOV (into memory)
, LOAD-LOCK XADD(0)
, fence-MFENCE
- full barrier
PowerPC: STORE-hwsync; st
, LOAD-hwsync;
ld; cmp; bc; isync
, fence-hwsync
Itanium: STORE-st.rel;
mf
, LOAD-ld.acq
, fence-mf
ARMv7: STORE-dmb ish; str;
dmb ish
, LOAD-ldr; dmb ish
, fence-dmb ish
ARMv7-alt: STORE-dmb ish; str
, LOAD-dmb ish;
ldr; dmb ish
, fence-dmb ish
ARMv8(AArch32): STORE-STL
, LOAD-LDA
, fence-DMB ISH
- full barrier
ARMv8(AArch64): STORE-STLR
, LOAD-LDAR
, fence-DMB ISH
- full barrier
MIPS64: STORE-sync; sw;
sync;
, LOAD-sync; lw; sync;
, fence-sync
There are described all mapping of C/C++11 semantics to differenct CPU architectures for: load(), store(), atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Because Sequential-Consistency prevents StoreLoad-reordering, and because Sequential-Consistency (store(memory_order_seq_cst)
and next load(memory_order_seq_cst)
) generates instructions between its are the same as atomic_thread_fence(memory_order_seq_cst)
, then atomic_thread_fence(memory_order_seq_cst)
prevents StoreLoad-reordering.