If newDisplayWidth is less than 1125899906842624 and the other integers are positive and do not exceed 53 bits, then newSelectionWidth
equals newDisplayWidth
. A proof follows.
Notation:
- I will use the term
double
to name the floating-point type being used, IEEE-754 64-bit binary.
- Text in
code
style represents computed values, while plain text represents mathematical values. Thus 1/3 is exactly one-third, while 1./3.
is the result of dividing 1 by 3 in floating-point arithmetic.
I assume:
- The widths are positive integers not wider than the
double
significand (53 bits).
- The divisions
oldDisplayImageWidth / realImageWidth
and newDisplayImageWidth / realImageWidth
are performed in double
arithmetic with the operands converted to double
.
The limits on the integers assures that conversion to double
is exact and that overflow and underflow are not encountered during the operations used in this problem.
Consider oldScale
, which is a double
set to oldDisplayImageWidth / realImageWidth
. The maximum error in a single floating-point operation in round-to-nearest mode is half an ULP (because every mathematical number is no farther than half an ULP from a representable number). Thus, oldScale
equals oldDisplayImageWidth / realImageWidth ? (1+e0), where e0 represents the relative error and is at most half a double
epsilon. (The double
epsilon is 2-52, so |e0| ≤ 2-53.)
Similarly, newScale
is newDisplayImageWidth / realImageWidth ? (1+e1), where e1 is some error that is at most 2-53.
Then oldSelectionWidth / oldScale
is oldSelectionWidth / oldScale
? (1+e2), again for some e2 ≤ 2-53, and oldSelectionWidth / oldScale * newScale
is oldSelectionWidth / oldScale
? (1+e2) ? newScale
? (1+oldSelectionWidth / oldScale
? (1+e3). Note that this is the argument passed to round
.
Now substitute the expressions we have for oldScale
and newScale
. This yields oldSelectionWidth / (oldDisplayImageWidth / realImageWidth ? (1+e0)) ? (1+e2) ? (newDisplayImageWidth / realImageWidth ? (1+e1)) ? (1+e3). The realImageWidth terms cancel, and we can rearrange the others to produce oldSelectionWidth ? newDisplayImageWidth / oldDisplayImageWidth ? (1+e1) ? (1+e2) ? (1+e3) / (1+e0).
We are given that oldSelectionWidth equals oldDisplayImageWidth, so those cancel, and the argument to round
is exactly: newDisplayImageWidth ? (1+e1) ? (1+e2) ? (1+e3) / (1+e0).
Consider the combined error terms minus one (this is the relative error in the final value): (1+e1) ? (1+e2) ? (1+e3) / (1+e0) – 1. This expression has greatest magnitude when e0 is –2-53 and the others are +2-53. Then it is slightly greater than 2 ULP (at most 324518553658426753804753784799233 / 730750818665451377972204001751459814038961127424). If newDisplayImageWidth is less than 1125899906842624, then newDisplayImageWidth times this relative error is less than ?. Therefore, newDisplayImageWidth ? (1+e1) ? (1+e2) ? (1+e3) / (1+e0) would be within ? of newDisplayImageWidth.
Since newDisplayImageWidth is an integer, if the argument to round
is within ? of newDisplayWidth, then the result is newDisplayWidth.
Therefore, if newDisplayWidth is less than 1125899906842624, then newSelectionWidth
equals newDisplayWidth
.
(The above proves that 1125899906842624 is a sufficient limit, but it may not be necessary. A more involved analysis may be able to prove that certain combinations of errors are impossible, so the maximum combined error is less than used above. This would relax the limit, allowing larger values of newDisplayWidth.)