algorithm - How to efficiently find k-nearest neighbours in high-dimensional data?

Question

Welcome To Ask or Share your Answers For Others

algorithm - How to efficiently find k-nearest neighbours in high-dimensional data?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - How to efficiently find k-nearest neighbours in high-dimensional data?

So I have about 16,000 75-dimensional data points, and for each point I want to find its k nearest neighbours (using euclidean distance, currently k=2 if this makes it easiser)

My first thought was to use a kd-tree for this, but as it turns out they become rather inefficient as the number of dimension grows. In my sample implementation, its only slightly faster than exhaustive search.

My next idea would be using PCA (Principal Component Analysis) to reduce the number of dimensions, but I was wondering: Is there some clever algorithm or data structure to solve this exactly in reasonable time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:47:33+0000

The Wikipedia article for kd-trees has a link to the ANN library:

ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions.

Based on our own experience, ANN performs quite efficiently for point sets ranging in size from thousands to hundreds of thousands, and in dimensions as high as 20. (For applications in significantly higher dimensions, the results are rather spotty, but you might try it anyway.)

As far as algorithm/data structures are concerned:

The library implements a number of different data structures, based on kd-trees and box-decomposition trees, and employs a couple of different search strategies.

I'd try it first directly and if that doesn't produce satisfactory results I'd use it with the data set after applying PCA/ICA (since it's quite unlikely your going to end up with few enough dimensions for a kd-tree to handle).

Categories

algorithm - How to efficiently find k-nearest neighbours in high-dimensional data?

algorithm - How to efficiently find k-nearest neighbours in high-dimensional data?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags