I enjoy using a lot of functional programming features when playing with Python lists. When I switch to Numpy for big dataset, I would expect that it is significantly more efficient than native Python list operations over ndarray.tolist()
since it is stored differently.
So when I try to apply map
, reduce
, filter
such FP things on Numpy array, I first search over the Numpy's doc for some "optimized things". And what I get is numpy.ufunc.reduce it seems to be the right thing. However, for curiosity, I did a simple test on both approaches:
- Use Numpy reduce
import numpy as np
a = np.array(range(100000000))
adf = lambda res, a: res + a
u_adf = np.frompyfunc(adf, 2, 1)
print(u_adf.reduce(a, initial=0))
- Use
ndarray.tolist()
and then use Python native reduce
import numpy as np
from functools import reduce
a = np.array(range(100000000))
adf = lambda res, a: res + a
print(reduce(adf, a.tolist(), 0))
Here comes the most unexpected thing:
> python 1.py
4999999950000000
python 1.py 28.00s user 5.71s system 102% cpu 32.925 total
> python 2.py
4999999950000000
python 2.py 26.38s user 6.38s system 103% cpu 31.792 total
The so-called "stupid" approach is actually the more efficient way?
How can that be? Can anyone please explain this for me? And hopefully gives some advice on using functional programming features on Numpy arrays.
Appreciate ^_^
question from:
https://stackoverflow.com/questions/66046009/numpy-ufunc-reduce-slower-than-applying-native-python-reduce-after-ndarray-tolis 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…