Returning null
is not the issue. The issue is that the new instance may be in a partially constructed state as perceived by another thread. Consider this declaration of Foo
.
class Foo
{
public int variable1;
public int variable2;
public Foo()
{
variable1 = 1;
variable2 = 2;
}
}
Here is how the code could get optimized by the C# compiler, JIT compiler, or hardware.1
if (instance == null)
{
lock (padlock)
{
if (instance == null)
{
instance = alloc Foo;
instance.variable1 = 1; // inlined ctor
instance.variable2 = 2; // inlined ctor
}
}
}
return instance;
First, notice that the constructor is inlined (because it was simple). Now, hopefully it is easy to see that instance
gets assigned the reference before its constituent fields get initialized inside the constructor. This is a valid strategy because reads and writes are free to float up and down as long as they do not pass the boundaries of the lock
or alter the logical flow; which they do not. So another thread could see instance != null
and attempt to use it before it is fully initialized.
volatile
fixes this issue because it treats reads as an acquire fence and writes as a release fence.
- acquire-fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
- release-fence: A memory barrier in which other reads & writes are not allowed to move after the fence.
So if we mark instance
as volatile
then the release-fence will prevent the above optimization. Here is how the code would look with the barrier annotations. I used an ↑ arrow to indicate a release-fence and a ↓ arrow to indicate an acquire-fence. Notice that nothing is allowed to float down past an ↑ arrow or up past an ↓ arrow. Think of the arrow head as pushing everything away.
var local = instance;
↓ // volatile read barrier
if (local == null)
{
var lockread = padlock;
↑ // lock full barrier
lock (lockread)
↓ // lock full barrier
{
local = instance;
↓ // volatile read barrier
if (local == null)
{
var ref = alloc Foo;
ref.variable1 = 1; // inlined ctor
ref.variable2 = 2; // inlined ctor
↑ // volatile write barrier
instance = ref;
}
↑ // lock full barrier
}
↓ // lock full barrier
}
local = instance;
↓ // volatile read barrier
return local;
The writes to the constituent variables of Foo
could still be reordered, but notice that the memory barrier now prevents them from occurring after the assignment to instance
. Using the arrows as a guide imagine various different optimization strategies that are allowed and disallowed. Remember that no reads or writes are allowed to float down past an ↑ arrow or up past an ↓ arrow.
Thread.VolatileWrite
would have solved this problem as well and could be used in languages without a volatile
keyword like VB.NET. If you take a look at how VolatileWrite
is implemented you would see this.
public static void VolatileWrite(ref object address, object value)
{
Thread.MemoryBarrier();
address = value;
}
Now this may seem counter intuitive at first. Afterall, the memory barrier is placed before the assignment. What about getting the assignment committed to main memory you ask? Would it not be more correct to place the barrier after the assignment? If that is what your intuition is telling you then it is wrong. You see memory barriers are not strictly about getting a "fresh read" or a "committed write". It is all about instruction ordering. This is by far the biggest source of confusion I see.
It might also be important to mention that Thread.MemoryBarrier
actually generates a full-fence barrier. So if I were to use my notation above with the arrows then it would look like this.
public static void VolatileWrite(ref object address, object value)
{
↑ // full barrier
↓ // full barrier
address = value;
}
So technically calling VolatileWrite
does more than what a write to a volatile
field would do. Remember that volatile
is not allowed in VB.NET for example, but VolatileWrite
is apart of the BCL so it can be used in other languages.
1This optimization is mostly theoretical. The ECMA specification does technically allow for it, but the Microsoft CLI implementation of the ECMA specification treats all writes as if they had release fence semantics already. It is possible that another implementation of the CLI could still perform this optimization though.