Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
870 views
in Technique[技术] by (71.8m points)

algorithm - Retrieving the top 100 numbers from one hundred million of numbers

One of my friend has been asked with a question

Retrieving the max top 100 numbers from one hundred million of numbers

in a recent job interview. Do you have any idea to come up with an efficient way to solve it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Run them all through a min-heap of size 100: for each input number k, replace the current min m with max(k, m). Afterwards the heap holds the 100 largest inputs.

A search engine like Lucene can use this method, with refinements, to choose the most-relevant search answers.

Edit: I fail the interview -- I got the details wrong twice (after having done this before, in production). Here's code to check it; it's almost the same as Python's standard heapq.nlargest():

import heapq

def funnel(n, numbers):
    if n == 0: return []
    heap = numbers[:n]
    heapq.heapify(heap)
    for k in numbers[n:]:
        if heap[0] < k:
            heapq.heapreplace(heap, k)
    return heap

>>> funnel(4, [3,1,4,1,5,9,2,6,5,3,5,8])
[5, 8, 6, 9]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...