z = x-y;
does not guarantee z+y == x
, and there is not always a solution for the problem of finding a z
such z+y == x
. A proof follows.
We assume IEEE-754 binary floating-point arithmetic with rounding to nearest, ties to even. The basic 64-bit format is used, but the result holds for other formats. Note that the 64-bit format uses 53-bit significands, meaning that only numbers with 53 or fewer significant binary digits can be represented.
Consider a target x
equal to 1+2?52. Let y
be 2?53. Then, after z = x-y;
, z+y == x
evaluates to false. The arithmetic details are shown below, but:
z = x-y;
sets z
to 1, and then z+y
produces 1, which is less than x
.
- If we increase
z
to the next representable number, 1+2?52, then z+y
produces 1+2?51, which is greater than x
.
- So there is no value of
z
that makes z+y == x
true.
Details:
The mathematical result of x
?y
is 1+2?53. As this has 54 significant bits (from 20 to 2?53), it is not representable, and the computed result of x-y
must be rounded. The two nearest numbers are 1 and 1+2?52. The ties-to-even rule produces the former number, 1, as the low bit of its significand is 0, while the low bit for 1+2?52 is 1.
Thus z = x-y;
sets z
to 1.
Then the mathematical result of z
+y
is 1+2?53. As above, this is rounded to 1, so the computed result of z+y
is 1. So z+y == x
compares 1 to 1+2?52 and produces false.
Furthermore, no value of z
could make the comparison true. If we increment z
by the smallest available step, from 1 to 1+2?52, the mathematical sum of z
+y
is then 1+2?52+2?53. This is midway between the two representable numbers 1+2?52 and 1+2?51. The former has a low bit of 1, and the latter has a low bit of 0, so the computed result of this z+y
is 1+2?51, which is of course not equal to 1+2?52.
Floating-point addition is weakly monotonic, so there are no values of z
that would produce 1+2?52 for z+y
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…