Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
325 views
in Technique[技术] by (71.8m points)

python - Find closest/similar value(vector) inside a matrix

let's say I have the following numpy matrix (simplified):

matrix = np.array([[1, 1],
               [2, 2],
               [5, 5],
               [6, 6]]
              )

And now I want to get the vector from the matrix closest to a "search" vector:

search_vec = np.array([3, 3])

What I have done is the following:

min_dist = None
result_vec = None
for ref_vec in matrix:
    distance = np.linalg.norm(search_vec-ref_vec)
    distance = abs(distance)
    print(ref_vec, distance)
    if min_dist == None or min_dist > distance:
        min_dist = distance
        result_vec = ref_vec

The result works, but is there a native numpy solution to do it more efficient? My problem is, that the bigger the matrix becomes, the slower the entire process will be. Are there other solutions that handle these problems in a more elegant and efficient way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Approach #1

We can use Cython-powered kd-tree for quick nearest-neighbor lookup, which is very efficient both memory-wise and with performance -

In [276]: from scipy.spatial import cKDTree

In [277]: matrix[cKDTree(matrix).query(search_vec, k=1)[1]]
Out[277]: array([2, 2])

Approach #2

With SciPy's cdist -

In [286]: from scipy.spatial.distance import cdist

In [287]: matrix[cdist(matrix, np.atleast_2d(search_vec)).argmin()]
Out[287]: array([2, 2])

Approach #3

With Scikit-learn's Nearest Neighbors -

from sklearn.neighbors import NearestNeighbors

nbrs = NearestNeighbors(n_neighbors=1).fit(matrix)
closest_vec = matrix[nbrs.kneighbors(np.atleast_2d(search_vec))[1][0,0]]

Approach #4

With Scikit-learn's kdtree -

from sklearn.neighbors import KDTree
kdt = KDTree(matrix, metric='euclidean')
cv = matrix[kdt.query(np.atleast_2d(search_vec), k=1, return_distance=False)[0,0]]

Approach #5

From eucl_dist package (disclaimer: I am its author) and following the wiki contents, we could leverage matrix-multiplication -

M = matrix.dot(search_vec)
d = np.einsum('ij,ij->i',matrix,matrix) + np.inner(search_vec,search_vec) -2*M
closest_vec = matrix[d.argmin()]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...