Edit to add: I don't think the numba benchmarks are fair, notes below
I'm trying to benchmark different approaches to numerically processing data for the following use case:
- Fairly big dataset (100,000+ records)
- 100+ lines of fairly simple code (z = x + y)
- Don't need to sort or index
In other words, the full generality of series and dataframes is not needed, although they are included here b/c they are still convenient ways to encapsulate the data and there is often pre- or post-processing that does require the generality of pandas over numpy arrays.
Question: Based on this use case, are the following benchmarks appropriate and if not, how can I improve them?
# importing pandas, numpy, Series, DataFrame in standard way
from numba import jit
nobs = 10000
nlines = 100
def proc_df():
df = DataFrame({ 'x': np.random.randn(nobs),
'y': np.random.randn(nobs) })
for i in range(nlines):
df['z'] = df.x + df.y
return df.z
def proc_ser():
x = Series(np.random.randn(nobs))
y = Series(np.random.randn(nobs))
for i in range(nlines):
z = x + y
return z
def proc_arr():
x = np.random.randn(nobs)
y = np.random.randn(nobs)
for i in range(nlines):
z = x + y
return z
@jit
def proc_numba():
xx = np.random.randn(nobs)
yy = np.random.randn(nobs)
zz = np.zeros(nobs)
for j in range(nobs):
x, y = xx[j], yy[j]
for i in range(nlines):
z = x + y
zz[j] = z
return zz
Results (Win 7, 3 year old Xeon workstation (quad-core). Standard and recent anaconda distribution or very close.)
In [1251]: %timeit proc_df()
10 loops, best of 3: 46.6 ms per loop
In [1252]: %timeit proc_ser()
100 loops, best of 3: 15.8 ms per loop
In [1253]: %timeit proc_arr()
100 loops, best of 3: 2.02 ms per loop
In [1254]: %timeit proc_numba()
1000 loops, best of 3: 1.04 ms per loop # may not be valid result (see note below)
Edit to add (response to jeff) alternate results from passing df/series/array into functions rather than creating them inside of functions (i.e. move the code lines containing 'randn' from inside function to outside function):
10 loops, best of 3: 45.1 ms per loop
100 loops, best of 3: 15.1 ms per loop
1000 loops, best of 3: 1.07 ms per loop
100000 loops, best of 3: 17.9 μs per loop # may not be valid result (see note below)
Note on numba results: I think the numba compiler must be optimizing on the for loop and reducing the for loop to a single iteration. I don't know that but it's the only explanation I can come up as it couldn't be 50x faster than numpy, right? Followup question here: Why is numba faster than numpy here?
See Question&Answers more detail:
os