Approach #1
Here's one with np.searchsorted
-
def find_indices(a,b,invalid_specifier=-1):
# Search for matching indices for each b in sorted version of a.
# We use sorter arg to account for the case when a might not be sorted
# using argsort on a
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
# Remove out of bounds indices as they wont be matches
idx[idx==len(a)] = 0
# Get traced back indices corresponding to original version of a
idx0 = sidx[idx]
# Mask out invalid ones with invalid_specifier and return
return np.where(a[idx0]==b, idx0, invalid_specifier)
Output for given sample -
In [41]: find_indices(a, b, invalid_specifier=np.nan)
Out[41]: array([ 9., 4., 6., nan])
Approach #2
Another based on lookup
for positive numbers -
def find_indices_lookup(a,b,invalid_specifier=-1):
# Setup array where we will assign ranged numbers
N = max(a.max(), b.max())+1
lookup = np.full(N, invalid_specifier)
# We index into lookup with b to trace back the positions. Non matching ones
# would have invalid_specifier values as wount had been indexed by ranged ones
lookup[a] = np.arange(len(a))
indices = lookup[b]
return indices
Benchmarking
Efficiency wasn't mentioned as a requirement in the question, but no-loop requirement might go there. Testing out with a setup that tries to reperesent the given sample setup, but scaling it up by 1000x
:
In [98]: a = np.random.permutation(np.unique(np.random.randint(0,20000,10000)))
In [99]: b = np.random.permutation(np.unique(np.random.randint(0,20000,4000)))
# Solutions from this post
In [100]: %timeit find_indices(a,b,invalid_specifier=np.nan)
...: %timeit find_indices_lookup(a,b,invalid_specifier=np.nan)
1.35 ms ± 127 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
220 μs ± 30.9 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# @Quang Hoang-soln2
In [101]: %%timeit
...: commons, idx_a, idx_b = np.intersect1d(a,b, return_indices=True)
...: orders = np.argsort(idx_b)
...: output = np.full(b.shape, np.nan)
...: output[orders] = idx_a[orders]
1.63 ms ± 59.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# @Quang Hoang-soln1
In [102]: %%timeit
...: s = b == a[:,None]
...: np.where(s.any(0), np.argmax(s,0), np.nan)
137 ms ± 9.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)