Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

sorting - Optimal algorithm for returning top k values from an array of length N

I have an array of n floats, and I wish to return the top k (in my case n ~ 100, k ~ 10)

Is there a known optimal solution path for this problem?

Could someone provide a C algorithm?

EDIT: actually there are two problems here: sorted and unsorted. I am interested in unsorted, which should be faster!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Method 1

Since k is small, you can use the tournament method to find the kth largest. This method is described in Knuth's Art of Programming, Volume 3, Page 212.

First create a tournament on n-k+2 elements. Something like a knockout tennis tournament. First you split into pairs and compare the members of the pairs (as if those two played a match and one lost). Then the winners, you split in to pairs again and so on, till you have a winner. You can view it as a tree, with the winner at the top.

This takes n-k+1 compares exactly.

Now the winner of these n-k+2 cannot be your kth largest element. Consider its path P up the tournament.

Of the remaining k-2 now pick one, and follow that up the path P which will give you a new largest. Basically you sort of redo the tournament with the previous winner being replaced by one of the k-2 elements. Let P be the path of the new winner. Now pick another from the k-3 and follow the new path up and so on.

At the end after you exhaust the k-2, replace the largest with -infinity and the largest of the tournament will be the kth largest. The elements you have thrown away are the top k-1 elements.

This takes at most n - k + (k-1) [log (n-k+2)] comparisons to find the top k. It uses O(n) memory though.

In terms of number of compares this should likely beat any selection algorithms.

Method 2

As an alternative, you can maintain a min-heap of k elements.

First insert k elements. Then for each element of array, if it is less than the min element of the heap, throw it away. Otherwise, delete-min of the heap and insert the element from the array.

At the end, the heap will contain the top k elements. This will take O(n log k) compares.

Of course, if n is small, just sorting the array should be good enough. The code will be simpler too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...