A norm is a function that takes a vector as an input and returns a scalar value that can be interpreted as the "size", "length" or "magnitude" of that vector. More formally, norms are defined as having the following mathematical properties:
- They scale multiplicatively, i.e. Norm(a·v) = |a|·Norm(v) for any scalar a
- They satisfy the triangle inequality, i.e. Norm(u + v) ≤ Norm(u) + Norm(v)
- The norm of a vector is zero if and only if it is the zero vector, i.e. Norm(v) = 0 ? v = 0
The Euclidean norm (also known as the L2 norm) is just one of many different norms - there is also the max norm, the Manhattan norm etc. The L2 norm of a single vector is equivalent to the Euclidean distance from that point to the origin, and the L2 norm of the difference between two vectors is equivalent to the Euclidean distance between the two points.
As @nobar's answer says, np.linalg.norm(x - y, ord=2)
(or just np.linalg.norm(x - y)
) will give you Euclidean distance between the vectors x
and y
.
Since you want to compute the Euclidean distance between a[1, :]
and every other row in a
, you could do this a lot faster by eliminating the for
loop and broadcasting over the rows of a
:
dist = np.linalg.norm(a[1:2] - a, axis=1)
It's also easy to compute the Euclidean distance yourself using broadcasting:
dist = np.sqrt(((a[1:2] - a) ** 2).sum(1))
The fastest method is probably scipy.spatial.distance.cdist
:
from scipy.spatial.distance import cdist
dist = cdist(a[1:2], a)[0]
Some timings for a (1000, 1000) array:
a = np.random.randn(1000, 1000)
%timeit np.linalg.norm(a[1:2] - a, axis=1)
# 100 loops, best of 3: 5.43 ms per loop
%timeit np.sqrt(((a[1:2] - a) ** 2).sum(1))
# 100 loops, best of 3: 5.5 ms per loop
%timeit cdist(a[1:2], a)[0]
# 1000 loops, best of 3: 1.38 ms per loop
# check that all 3 methods return the same result
d1 = np.linalg.norm(a[1:2] - a, axis=1)
d2 = np.sqrt(((a[1:2] - a) ** 2).sum(1))
d3 = cdist(a[1:2], a)[0]
assert np.allclose(d1, d2) and np.allclose(d1, d3)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…