GCC's behavior may be conforming, and even if it isn't, you should not rely on volatile
to do what you want in cases like these. The C committee designed volatile
for memory-mapped hardware registers and for variables modified during abnormal control flow (e.g. signal handlers and setjmp
). Those are the only things it is reliable for. It is not safe to use as a general "don't optimize this out" annotation.
In particular, the standard is unclear on a key point. (I've converted your code to C; there shouldn't be any divergence between C and C++ here. I've also manually done the inlining that would happen before the questionable optimization, to show what the compiler "sees" at that point.)
extern void use_arr(void *, size_t);
void foo(void)
{
char arr[8];
use_arr(arr, sizeof arr);
for (volatile char *p = (volatile char *)arr;
p < (volatile char *)(arr + 8);
p++)
*p = 0;
}
The memory-clearing loop accesses arr
through a volatile-qualified lvalue, but arr
itself is not declared volatile
. It is, therefore, at least arguably allowed for the C compiler to infer that the stores made by the loop are "dead", and delete the loop altogether. There's text in the C Rationale that implies that the committee meant to require those stores to be preserved, but the standard itself does not actually make that requirement, as I read it.
For more discussion of what the standard does or does not require, see Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?, Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?, and GCC bug 71793.
For more on what the committee thought volatile
was for, search the C99 Rationale for the word "volatile". John Regehr's paper "Volatiles are Miscompiled" illustrates in detail how programmer expectations for volatile
may not be satisfied by production compilers. The LLVM team's series of essays "What Every C Programmer Should Know About Undefined Behavior" does not touch specifically on volatile
but will help you understand how and why modern C compilers are not "portable assemblers".
To the practical question of how to implement a function that does what you wanted volatileZeroMemory
to do: Regardless of what the standard requires or was meant to require, it would be wisest to assume that you can't use volatile
for this. There is an alternative that can be relied on to work, because it would break far too much other stuff if it didn't work:
extern void memory_optimization_fence(void *ptr, size_t size);
inline void
explicit_bzero(void *ptr, size_t size)
{
memset(ptr, 0, size);
memory_optimization_fence(ptr, size);
}
/* in a separate source file */
void memory_optimization_fence(void *unused1, size_t unused2) {}
However, you must make absolutely sure that memory_optimization_fence
is not inlined under any circumstances. It must be in its own source file and it must not be subjected to link-time optimization.
There are other options, relying on compiler extensions, that may be usable under some circumstances and can generate tighter code (one of them appeared in a previous edition of this answer), but none are universal.
(I recommend calling the function explicit_bzero
, because it is available under that name in more than one C library. There are at least four other contenders for the name, but each has been adopted only by a single C library.)
You should also know that, even if you can get this to work, it may not be enough. In particular, consider
struct aes_expanded_key { __uint128_t rndk[16]; };
void encrypt(const char *key, const char *iv,
const char *in, char *out, size_t size)
{
aes_expanded_key ek;
expand_key(key, ek);
encrypt_with_ek(ek, iv, in, out, size);
explicit_bzero(&ek, sizeof ek);
}
Assuming hardware with AES acceleration instructions, if expand_key
and encrypt_with_ek
are inline, the compiler may be able to keep ek
entirely in the vector register file -- until the call to explicit_bzero
, which forces it to copy the sensitive data onto the stack just to erase it, and, worse, doesn't do a darn thing about the keys that are still sitting in the vector registers!