BigO complexity is not often used with Python and numpy. It's a measure of how the code scales with problem size. That's useful in a compiled language like C. But here the code is a mix of interpreted Python and compiled code. Both can have the same bigO, but the interpreted version will be orders of magnitude slower. That's why most of the SO questions about improving numpy speed, talk about 'removing loops' and 'vectorizing'.
Also few operations are pure O(n); most are a mix. There's a setup cost, plus a per element cost. If the per element cost is small, the setup cost dominates.
If starting with lists, it's often faster to iterate on the list, because converting a list to an array has a substantial overhead (O(n)).
If you already have arrays, then avoid (python level) iteration where possible. Iteration is part of most calculations, but numpy lets you do a lot of that in faster compiled code (faster O(n)).
At some point you have to understand how numpy stores its arrays. The distinction between view
and copy
is important. A view is in effect O(1), a copy O(n).
Often you'll see SO answers do timeit
speed comparisons. I often add the caution that results might vary with problem size. The better answers will time various size problems, and show the results on a nice plot. The results are often a mix of straight lines (O(n)), and curves (varying blends of O(1) and O(n) components).
You asked specifically about np.array
. Here are some sample timings:
In [134]: %%timeit alist = list(range(1000))
...: np.array(alist)
67.9 μs ± 839 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [135]: %%timeit alist = list(range(10))
...: np.array(alist)
2.19 μs ± 9.88 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [136]: %%timeit alist = list(range(2000))
...: np.array(alist)
134 μs ± 1.98 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
copy an array:
In [137]: %%timeit alist = list(range(2000)); arr=np.array(alist)
...: np.array(arr)
1.77 μs ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
no copy:
In [138]: %%timeit alist = list(range(2000)); arr=np.array(alist)
...: np.array(arr, copy=False)
237 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
from a list of strings:
In [139]: %%timeit alist = [str(i) for i in range(2000)]
...: np.array(alist, dtype=int)
286 μs ± 4.8 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Almost all calculations in numpy
are O(n). If it involves each element of an array it, speed will depend on the size of the array. Some array manipulations are O(1), such as reshaping, because they don't actually do anything with the data; they change properties like shape and strides.
Search problems often grow faster than O(n); usually numpy
is not the best choice for that kind of problem. Smart of use Python lists and dictionaries can be faster.