The question was:
Is this program valid in all cases?
The answer is "no, it is not".
The only interesting part of the program is what happens within the block guarded by the if
statement. It is somewhat difficult to guarantee the truthness of the controlling expression, so I've modified it somewhat by moving the variables to global scope. The same question remains: is this program always valid:
#include <stdio.h>
#include <string.h>
static int a[1] = { 2 };
static int b = 1;
static int *pa1 = &a[0] + 1;
static int *pb = &b;
int main(void) {
if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
int *p;
printf ("pa1 == pb
"); // interesting part
memcpy (&p, &pa1, sizeof p); // make a copy of the representation
memcpy (&pa1, &p, sizeof p); // pa1 is a copy of the bytes of pa1 now
// and the bytes of pa1 happens to be the bytes of pb
*pa1 = 2; // does pa1 legally point to b?
}
}
Now the guarding expression is true on my compiler (of course, by having these have static storage duration, a compiler cannot really prove that they're not modified by something else in the interim...)
The pointer pa1
points to just past the end of the array a
, and is a valid pointer, but must not be dereferenced, i.e. *pa1
has undefined behaviour given that value. The case is now made that copying this value to p
and back again would make the pointer valid.
The answer is no, this is still not valid, but it is not spelt out very explicitly in the standard itself. The committee response to C standard defect report DR 260 says this:
If two objects have identical bit-pattern representations and their types are the same they may still compare as unequal (for example if one object has an indeterminate value) and if one is an indeterminate value attempting to read such an object invokes undefined behavior. Implementations are permitted to track the origins of a bit-pattern and treat those representing an indeterminate value as distinct from those representing a determined value. They may also treat pointers based on different origins as distinct even though they are bitwise identical.
I.e. you cannot even draw the conclusion that if pa1
and pb
are pointers of same type and memcmp (&pa1, &pb, sizeof pa1) == 0
is true that it is also necessary pa1 == pb
, let alone that copying the bit pattern of undereferenceable pointer pa1
to another object and back again would make pa1
valid.
The response continues:
Note that using assignment or bitwise copying via memcpy
or memmove
of a determinate value makes the destination acquire the same determinate value.
i.e. it confirms that memcpy (&p, &pa1, sizeof p);
will cause p
to acquire the same value as pa1
, which it didn't have before.
This is not just a theoretical problem - compilers are known to track pointer provenance. For example the GCC manual states that
When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.
i.e. were the program written as:
int a[1] = { 0 }, *pa1 = &a[0] + 1, b = 1, *pb = &b;
if (memcmp (&pa1, &pb, sizeof pa1) == 0) {
uintptr_t tmp = (uintptr_t)&a[0]; // pointer to a[0]
tmp += sizeof (a[0]); // value of address to a[1]
pa1 = (int *)tmp;
*pa1 = 2; // pa1 still would have the bit pattern of pb,
// hold a valid pointer just past the end of array a,
// but not legally point to pb
}
the GCC manual points out that this is explicitly not legal.