Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
598 views
in Technique[技术] by (71.8m points)

objective c - Arc4random modulo biased

According to this documentation,

arc4random_uniform() is recommended over constructions like arc4random() % upper_bound as it avoids "modulo bias" when the upper bound is not a power of two.

How bad is the bias? For example if I generate random numbers with an upper bound of 6, what's the difference between using arc4random with % and arc4random_uniform()?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

arc4random() returns an unsigned 32-bit integer, meaning the values are between 0 and 2^32-1 = 4 294 967 295.

Now, the bias results from the fact that the multiple subintervals created with modulo are not fitting exactly into the random output range. Lets imagine for clarity a random generator that creates numbers from 0 to 198 inclusive. You want numbers from 0 to 99, therefore you calculate random() % 100, yielding 0 to 99:

0 % 100 = 0
99 % 100 = 99
100 % 100 = 0
198 % 100 = 98

You see that 99 is the only number which can occur only once while all others can occur twice in a run. That means that the probability for 99 is exactly halved which is also the worst case in a bias where at least 2 subintervals are involved.
As all powers of two smaller than the range interval fits nicely into the 2^32 interval, the bias disappears in this case.

The implications are that the smaller the result set with modulo and the higher the random output range, the smaller the bias. In your example, 6 is your upper bound (I assume 0 is the lower bound), so you use % 7, resulting that 0-3 occurs 613 566 757 times while 4-6 occurs 613 566 756 times.
So 0-3 is 613 566 757 / 613 566 756 = 1.0000000016298 times more probable than 4-6.

While it seems easy to dismiss, some experiments (especially Monte-Carlo experiments) were flawed exactly because these seemingly incredible small differences were pretty important.

Even worse is the bias if the desired output range is bigger than the random target range. Please read the Fisher-Yates shuffle entry because many poker sites learned the hard way that normal linear congruential random generators and bad shuffling algorithms resulted in impossible or very probable decks or worse, predictable decks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...