floating point - double-double implementation resilient to FPU rounding mode

Question

Welcome To Ask or Share your Answers For Others

floating point - double-double implementation resilient to FPU rounding mode

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

floating point - double-double implementation resilient to FPU rounding mode

Context: double-double arithmetic

“Double-double” is a representation of numbers as the sum of two double-precision numbers without overlap in the significands. This representation takes advantage of existing double-precision hardware implementations for “near quadruple-precision” computations.

One typical low-level C function in a double-double implementation may take two double-precision numbers a and b with |a| ≥ |b| and compute the double-double number (s, e) that represents their sum:

s = a + b;
e = b - (s - a);

(Adapted from this article.)

These implementations typically assume round-to-nearest-even mode.

In the above computation, (s, e) is a normalized double-double only because of this assumption. Without it, with a == 0x1.0p60, b == 1, in round-upward mode, s is computed as 0x1.0000000000001p60 and e a bit above -0x0.0000000000001p60. Their sum is equal to the mathematical sum of a and b but their significands overlap.

Take a == 0x1.0p120 and the mathematical sums of a and b on the one hand and s and e on the other hand do not even coincide any more.

Question

Is there a way to build a double-double-like library with the same properties that a typical double-double library has in round-to-nearest-even (that is, relatively fast and relatively accurate), but that works whatever the rounding mode happens to be?

Does such a library already exist?

More general context: correctly rounded elementary functions

Implementations of the double-double sort are used for intermediate computations in the implementation of libraries of correctly rounded elementary functions. As a result, libraries implemented this way tend to fail spectacularly when a function is called while the FPU is not in round-to-nearest-even mode. Changing the rounding mode inside the function is not very palatable, for performance reasons and because a signal arriving while the function is executing would leave the FPU in round-to-nearest-even mode. The simplest way I see to have fast, correctly rounded elementary functions that work in any rounding mode would be if one could somehow rely on a double-double-kind of arithmetic that worked in any rounding mode.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:36+0000

The article referred to by njuffa offers the function below, with very similar notations to those of my question, except that what is denoted fl (a+b) there is simply denoted a+b in my question:

Two?Sum?toward?zero2 (a, b)

if (|a| < |b|)
  swap (a , b)
s = fl (a + b)
d = fl (s ? a)
e = fl (b ? d)
if(|2 ? b|<|d|)
  s = a, e = b
return (s, e)

This is a very neat fix for this particular elementary computation when in round-toward-zero mode. It makes one hope that something would be possible for implementing a correctly rounded elementary function, at the very least by testing the rounding mode early and picking separate algorithms, or perhaps by writing very careful code that works for all rounding modes.

Categories

floating point - double-double implementation resilient to FPU rounding mode

floating point - double-double implementation resilient to FPU rounding mode

Context: double-double arithmetic

Question

More general context: correctly rounded elementary functions

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags