The common approach to eliminating bias is to throw away numbers that are outside of the desired range. As noted, this is wasteful. It's possible to minimize the waste by starting with a larger number of bits and generating multiple random numbers at the same time; you can achieve a better match between the range of inputs to outputs.
For example take a roll of a die. The output has 6 possibilities. The naive approach would take 3 random bits for each random number produced. The first example demonstrates the pigeon-hole problem.
def pigeon_die(total_bit_count):
for i in xrange(total_bit_count // 3):
bits = random.getrandbits(3)
yield 1 + bits * 6 // 8
1 : 832855
2 : 417835
3 : 416012
4 : 833888
5 : 416189
6 : 416554
total 3333333
max/min 2.00448063998
The second example is the wasteful approach commonly used. You can see that it generates fewer random number from the same number of random bits, but the bias is eliminated.
def wasteful_die(total_bit_count):
for i in xrange(total_bit_count // 3):
bits = random.getrandbits(3)
if bits < 6:
yield 1 + bits
1 : 417043
2 : 415812
3 : 417835
4 : 416012
5 : 416645
6 : 417243
total 2500590
max/min 1.00486517946
The final example takes 13 bits at a time and generates 5 random numbers from it. This generates even more numbers than the naive approach!
def optimized_die(total_bit_count):
for i in xrange(total_bit_count // 13):
bits = random.getrandbits(13)
if bits < 6**5:
for j in range(5):
yield 1 + bits % 6
bits //= 6
1 : 608776
2 : 608849
3 : 608387
4 : 608119
5 : 607855
6 : 608559
total 3650545
max/min 1.00163525841
The choice of 13 bits was made by taking the logarithm base 6 of powers of 2 and choosing the one that was closest to an integer.
def waste_list(n):
for bit in range(1, 31):
potential = math.log(2**bit, n)
count = int(potential)
if count > 0:
waste = potential - count
yield waste, bit, count
for waste, bit, count in sorted(waste_list(6)):
print bit, count, waste
if bit == 3:
break
13 5 0.029086494049
26 10 0.0581729880981
8 3 0.0948224578763
21 8 0.123908951925
3 1 0.160558421704
As you can see, there are 4 choices better than the simple 3 bits.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…