The issue you're having between -0.
and +0.
is part of the specification of how floats are supposed to behave (IEEE754). In some circumstance, one needs this distinction. See, for example, the docs that are linked to in the docs for around
.
It's also worth noting that the two zeros should compare to equal, so
np.array(-0.)==np.array(+0.)
# True
That is, I think the problem is more likely with your uniqueness comparison. For example:
a = np.array([-1., -0., 0., 1.])
np.unique(a)
# array([-1., -0., 1.])
If you want to keep the numbers as floating point but have all the zeros the same, you could use:
x = np.linspace(-2, 2, 6)
# array([-2. , -1.2, -0.4, 0.4, 1.2, 2. ])
y = x.round()
# array([-2., -1., -0., 0., 1., 2.])
y[y==0.] = 0.
# array([-2., -1., 0., 0., 1., 2.])
# or
y += 0.
# array([-2., -1., 0., 0., 1., 2.])
Note, though, you do have to do this bit of extra work since you are trying to avoid the floating point specification.
Note also that this isn't due to a rounding error. For example,
np.fix(np.array(-.4)).tostring().encode('hex')
# '0000000000000080'
np.fix(np.array(-0.)).tostring().encode('hex')
# '0000000000000080'
That is, the resulting numbers are exactly the same, but
np.fix(np.array(0.)).tostring().encode('hex')
# '0000000000000000'
is different. This is why your method is not working, since it's comparing the binary representation of the numbers, which is different for the two zeros. Therefore, I think the problem is more the method of comparison than the general idea of comparing floating point numbers for uniqueness.
A quick timeit test for the various approaches:
data0 = np.fix(4*np.random.rand(1000000,)-2)
# [ 1. -0. 1. -0. -0. 1. 1. 0. -0. -0. .... ]
N = 100
data = np.array(data0)
print timeit.timeit("data += 0.", setup="from __main__ import np, data", number=N)
# 0.171831846237
data = np.array(data0)
print timeit.timeit("data[data==0.] = 0.", setup="from __main__ import np, data", number=N)
# 0.83500289917
data = np.array(data0)
print timeit.timeit("data.astype(np.int).astype(np.float)", setup="from __main__ import np, data", number=N)
# 0.843791007996
I agree with @senderle's point that if you want simple and exact comparisons and can get by with ints, ints will generally be easier. But if you want unique floats, you should be able to do this too, though you need to do it a bit more carefully. The main issue with floats is that you can have small differences that can be introduced from calculations and don't appear in a normal print
, but this isn't an huge barrier and especially not after a round, fix, rint
for a reasonable range of floats.