Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
351 views
in Technique[技术] by (71.8m points)

python - Numpy drawing from urn

I want to run a relatively simple random draw in numpy, but I can't find a good way to express it. I think the best way is to describe it as drawing from an urn without replacement. I have an urn with k colors, and n_k balls of every color. I want to draw m balls, and know how many balls of every color I have.

My current attempt it

np.bincount(np.random.permutation(np.repeat(np.arange(k), n_k))[:m], minlength=k)

here, n_k is an array of length k with the counts of the balls.

It seems that's equivalent to np.bincount(np.random.choice(k, m, n_k / n_k.sum(), minlength=k)

which is a bit better, but still not great.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you want is an implementation of the multivariate hypergeometric distribution. I don't know of one in numpy or scipy, but it might already exist out there somewhere.

I contributed an implementation of the multivariate hypergeometric distribution to numpy 1.18.0; see numpy.random.Generator.multivariate_hypergeometric.

For example, to draw 15 samples from an urn containing 12 red, 4 green and 18 blue marbles, and repeat the process 10 times:

In [4]: import numpy as np

In [5]: rng = np.random.default_rng()

In [6]: colors = [12, 4, 18]

In [7]: rng.multivariate_hypergeometric(colors, 15, size=10)                    
Out[7]: 
array([[ 5,  4,  6],
       [ 3,  3,  9],
       [ 6,  2,  7],
       [ 7,  2,  6],
       [ 3,  0, 12],
       [ 5,  2,  8],
       [ 6,  2,  7],
       [ 7,  1,  7],
       [ 8,  1,  6],
       [ 6,  1,  8]])

The rest of this answer is now obsolete, but I'll leave for posterity (whatever that means...).


You can implement it using repeated calls to numpy.random.hypergeometric. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.

For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):

from __future__ import print_function

import numpy as np


nred = 12
ngreen = 4
nblue = 18

m = 15

red = np.random.hypergeometric(nred, ngreen + nblue, m)
green = np.random.hypergeometric(ngreen, nblue, m - red)
blue = m - (red + green)

print("red:   %2i" % red)
print("green: %2i" % green)
print("blue:  %2i" % blue)

Sample output:

red:    6
green:  1
blue:   8

The following function generalizes that to choosing m balls given an array colors holding the number of each color:

def sample(m, colors):
    """
    Parameters
    ----------
    m : number balls to draw from the urn
    colors : one-dimensional array of number balls of each color in the urn

    Returns
    -------
    One-dimensional array with the same length as `colors` containing the
    number of balls of each color in a random sample.
    """

    remaining = np.cumsum(colors[::-1])[::-1]
    result = np.zeros(len(colors), dtype=np.int)
    for i in range(len(colors)-1):
        if m < 1:
            break
        result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
        m -= result[i]
    result[-1] = m
    return result

For example,

>>> sample(10, [2, 4, 8, 16])
array([2, 3, 1, 4])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...