I am currently replacing some code which I wrote with the assumption that the inputs are numpy arrays such that it takes arbitrary lists as input. Unfortunately the solutions I produced so far are substantially slower than the original code. Can someone give advise how I might reach back to the speed of the original solution?
The code is supposed to produce a boolean index for the upper triangular matrix representation. Without input checks and stuff like this this is the meat of the code:
some import and example input:
import numpy as np
descriptor = list(range(100))
descriptor_arr = np.array(descriptor)
value = [0, 2, 13, 14, 11, 23, 45, 16]
This is my current list based version:
def get_idx_slow(descriptor, value):
ix, iy = np.triu_indices(len(descriptor), 1)
pattern_in_value = [p in value for p in descriptor]
return [(pattern_in_value[idx_x] & pattern_in_value[idx_y]) for idx_x, idx_y in zip(ix, iy)]
This is the previous array based version:
def get_idx_fast(descriptor, value):
ix, iy = np.triu_indices(len(descriptor), 1)
selection_x = np.any(np.array([descriptor[ix] == v for v in value]), axis=0)
selection_y = np.any(np.array([descriptor[iy] == v for v in value]), axis=0)
return selection_x & selection_y
My timing results:
%timeit get_idx_slow(descriptor, value)
1.2 ms ± 33.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit get_idx_fast(descriptor_arr, value)
217 μs ± 1.88 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
question from:
https://stackoverflow.com/questions/65923287/is-there-a-way-to-make-list-processing-as-fast-as-np-array 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…