When I initialize two random matrices (here of shape (6,2)), and re-order the rows, np.sum() will give me different results. I understand this is likely due to numerical errors, as the differences are small, but how exactly is np.sum() summing the elements? How can I replicate the results for each matrix order?
x1 = np.array([[-0.31381854, -0.05944943],
[ 0.3848904 , -0.36534384],
[ 1.1122322 , 1.2480698 ],
[-1.4493011 , 0.5094067 ],
[ 0.00905334, 0.77591574],
[ 0.25694364, -2.108599 ]], dtype=np.float32)
x2 = np.array([[-0.31381854, -0.05944943],
[ 1.1122322 , 1.2480698 ],
[ 0.00905334, 0.77591574],
[ 0.3848904 , -0.36534384],
[-1.4493011 , 0.5094067 ],
[ 0.25694364, -2.108599 ]], dtype=np.float32)
print(np.sum(x1))
print(np.sum(x2))
0.0
-2.3841858e-07
Although they have the exact same elements per column and row, the sums are different.
If I sum all elements using the sum() function, the results do not disagree:
print(sum(sum(x1)))
print(sum(sum(x2)))
-5.960464477539063e-08
-5.960464477539063e-08
When I sum the columns invidividually, using the python sum() function, I get the same resulting sum:
print(sum(x1[:,0]))
print(sum(x2[:,0]))
-6.705522537231445e-08
-6.705522537231445e-08
print(sum(x1[:,1]))
print(sum(x2[:,1]))
-2.9802322387695312e-08
-2.9802322387695312e-08
But again, if I sum the columns separately, this time using np.sum( axis=1), the results are different:
print(np.sum(x1),1)
print(np.sum(x2),1)
-2.3841858e-07 1
0.0 1
This is an issue for large matrices with thousands of elements, where the numerical inaccuracies sum up to be massive differences.
I just don't understand how np.sum() is operating, in a way that a simple sum is giving such different results, where the python canonical sum() function does not!
question from:
https://stackoverflow.com/questions/65906908/why-exactly-does-numpy-sum-gives-different-results-on-the-same-matrix-with-dif