Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
251 views
in Technique[技术] by (71.8m points)

java - How to lookup the most similar bit vector in an prefix tree for NN-search?

The problem I'm trying to solve is explained in this question: Finding the single nearest neighbor using a Prefix tree in O(1)?

My question is regarding the Proposed solution section in that question page. In that section it's mentioned that we find the nearest neighbor from each prefix tree by traversing the tree starting from the node. Well looking up if a key exists in a prefix tree is straightforward but getting the most similar one I don't understand at all. How to accomplish this?

I wish if someone can explain this to me, if not graphically (which is preferred), then at least with some details.

Edit:

here's the code of the paper. It's written in python and unfortunately I never worked with python before. If anybody is familiar with python and can lookup the code to see how they find the nearest neighbors using prefix trees. https://github.com/kykamath/streaming_lsh/blob/master/streaming_lsh/nearest_neighbor_lsh.py

https://github.com/kykamath/streaming_lsh/blob/master/streaming_lsh/classes.py

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First know they iterate over the entire tree. By iterating over the entire tree they are guaranteed to find the most similar neighbor.

To be more efficient in the average case use a DFS graph traversal for the tree. Notice that since it is a tree, there will be no need for a coloring scheme for visited nodes.

Start with the closest object as null and at the root of the tree.

For each node you should traverse the children in the order of how much they add to the edit distance and only if the added edit distance would not be greater than the distance to the closest object. For example, with hamming distance, first traverse the child that would add "O" to the overall distance, then afterwards traverse the child that would add "1" to the overall distance. But do not traverse into the "1" child if that would make the edit distance greater than the current closest distance.

When you encounter a "word" within the prefix tree check to see if it has a distance to the query object that is less than the closest object and assign it to closest.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...