Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

algorithm - Random selection

Given two integer numbers N and n (N >= n > 0), how do I generate random selection (without repetition!) of [0, N) with length = n? E.g. Given N = 5, n = 3 possible solutions are (3,0,2) or (2,4,1), etc.

There is a restriction that prevents of using naive approach: memory usage must be O(n), not O(N).

/* Under naive approach I mean usage of temporary array of size=N that's initially filled in with numbers 0..N-1 in order. Required n items are selected randomly from this array. */

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Go through all the numbers m from 0 to N, deciding whether to include m in the set as encountered. You need to update the probability of including the next number based on the numbers already treated.

Let's apply this idea to the example given, with n=3 and N=5. First consider m=0. There are 3 numbers remaining, and 5 possibilities, so 0 is in the set with probability 3/5. Use a random number generator to decide to include the number or not. Now consider m=1. If you included 0 in the set, then you have 2 numbers remaining and 4 possibilities, so it should be included with probability 2/4, but if 0 is not included, you have 3 numbers remaining and 4 possibilities and thus 1 should be included with probability 3/4. This continues until the required 3 numbers are included in the set.

Here's an implementation in Python:

from __future__ import division
import random

def rand_set(n, N):
    nums_included=set()
    for m in range(N):
        prob = (n-len(nums_included)) / (N-m)
        if random.random() < prob:
            nums_included.add(m)
    return nums_included

You could (and probably should) add in a test to see when you've got enough numbers in your set, and break out of the loop early.

The numbers are stored in a set, which varies in size from 0 to n, so the storage used is O(n). Everything else uses constant space, so it's overall O(n).

EDIT Actually, you can go a little further with this approach, so that it takes constant space. In Python, just make a generator based on the above:

def rand_set_iter(n, N):
    num_remaining = n
    m = 0
    while num_remaining > 0:
        prob = num_remaining / (N-m)
        if random.random() < prob:
            num_remaining -= 1
            yield m
        m += 1

Here, I've gone ahead and used a while loop instead of the for loop. To store the results, you'll of course need to use O(n) space. But if all you need to do is iterate through the numbers, the generator version does it in O(1).

For a language without generators, you can roll your own generator, calling a function repeatedly and updating a static or global variable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...