performance - Python's sum vs. NumPy's numpy.sum

Question

Welcome To Ask or Share your Answers For Others

performance - Python's sum vs. NumPy's numpy.sum

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance - Python's sum vs. NumPy's numpy.sum

What are the differences in performance and behavior between using Python's native sum function and NumPy's numpy.sum? sum works on NumPy's arrays and numpy.sum works on Python lists and they both return the same effective result (haven't tested edge cases such as overflow) but different types.

>>> import numpy as np
>>> np_a = np.array(range(5))
>>> np_a
array([0, 1, 2, 3, 4])
>>> type(np_a)
<class 'numpy.ndarray')

>>> py_a = list(range(5))
>>> py_a
[0, 1, 2, 3, 4]
>>> type(py_a)
<class 'list'>

# The numerical answer (10) is the same for the following sums:
>>> type(np.sum(np_a))
<class 'numpy.int32'>
>>> type(sum(np_a))
<class 'numpy.int32'>
>>> type(np.sum(py_a))
<class 'numpy.int32'>
>>> type(sum(py_a))
<class 'int'>

Edit: I think my practical question here is would using numpy.sum on a list of Python integers be any faster than using Python's own sum?

Additionally, what are the implications (including performance) of using a Python integer versus a scalar numpy.int32? For example, for a += 1, is there a behavior or performance difference if the type of a is a Python integer or a numpy.int32? I am curious if it is faster to use a NumPy scalar datatype such as numpy.int32 for a value that is added or subtracted a lot in Python code.

For clarification, I am working on a bioinformatics simulation which partly consists of collapsing multidimensional numpy.ndarrays into single scalar sums which are then additionally processed. I am using Python 3.2 and NumPy 1.6.

Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:07:22+0000

I got curious and timed it. numpy.sum seems much faster for numpy arrays, but much slower on lists.

import numpy as np
import timeit

x = range(1000)
# or 
#x = np.random.standard_normal(1000)

def pure_sum():
    return sum(x)

def numpy_sum():
    return np.sum(x)

n = 10000

t1 = timeit.timeit(pure_sum, number = n)
print 'Pure Python Sum:', t1
t2 = timeit.timeit(numpy_sum, number = n)
print 'Numpy Sum:', t2

Result when x = range(1000):

Pure Python Sum: 0.445913167735
Numpy Sum: 8.54926219673

Result when x = np.random.standard_normal(1000):

Pure Python Sum: 12.1442425643
Numpy Sum: 0.303303771848

I am using Python 2.7.2 and Numpy 1.6.1

Categories

performance - Python's sum vs. NumPy's numpy.sum

performance - Python's sum vs. NumPy's numpy.sum

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags