From your code it looks like you're randomly generating planet atmospheres, presumably for some kind of game or something. At any rate, the randomness of it is convincing me it doesn't need to be too accurate.
So i'd suggest you don't use floats, just use int
s and go up to 100. Then you'll get your exact summing. For any maths you want to use them in just cast.
Is this not an option?
If you insist on using floats, then read on...
The problem you have using floats is as follows:
A floating point (in this case a double) is represented like this:
which corresponds to a double
of value:
So,
your number is (1+M) * 2**(E)
(where E
= e-offset
)
1+M
is always in the range 1-2.
So, we have equally spaced numbers inbetween each pair of power of two (positive and negative), and the spacing between the numbers doubles with each increase in the exponent, E
.
Think about this, it means that there is a constant spacing of representable numbers between each of these numbers [(1,2),(2,4),(4,8), etc]
. This also applies to the negative powers of two, so:
0.5 - 1
0.25 - 0.5
0.125 - 0.25
0.0625 - 0.125
etc.
And in each range, there are the same quantity of numbers. This means that if you take a number in the range (0.25,0.5)
and add it to a number in the range (0.5,1)
, then you have a 50% chance that the number cannot be exactly represented.
If you sum two floating point numbers for which the exponents differ by D
, then the chances of the sum being exactly representable are 2-D.
If you then want to represent the range 0-1
, then you'll have to be very careful about which floats you use (i.e. force the last N
bits of the fraction to be zero, where N
is a function of E
).
If you go down this route, then you'll end up with far more floats at the top of the range than the bottom.
The alternative is to decide how close to zero you want to be able to get. Lets say you want to get down to 0.0001.
0.0001 = (1+M) * 2E
log2(0.0001) = -13.28771...
So we'll use -14 as our minimum exponent.
And then to get up to 1, we just leave the exponent as -1.
So now we have 13 ranges, each with twice as many values as the lower one which we can sum without having to worry about precision.
This also means though, that the top range has 213 more values we can use. This obviously isn't okay.
So, after picking a float, then round it to the nearest allowable value - in this case, by round I just mean set the last 13 bits to zero, and just put it all in a function, and apply it to your numbers immediately after you get them out of rand
.
Something like this:
from ctypes import *
def roundf(x,bitsToRound):
i = cast(pointer(c_float(x)), POINTER(c_int32)).contents.value
bits = bin(i)
bits = bits[:-bitsToRound] + "0"*bitsToRound
i = int(bits,2)
y = cast(pointer(c_int32(i)), POINTER(c_float)).contents.value
return y
(images from wikipedia)