Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
109 views
in Technique[技术] by (71.8m points)

c# - Bug in .Net's `Random` class?

I was looking at a question that was talking about a bad implementation of the Fisher-Yates shuffling algorithm and I was perplexed that there was a bias when implemented incorrectly.

The two algorithms are these:

private Random _random = new Random();

public int[] FisherYates(int[] source)
{
    int[] output = source.ToArray();
    for (var i = 0; i < output.Length; i++)
    {
        var j = _random.Next(i, output.Length);
        (output[i], output[j]) = (output[j], output[i]);
    }
    return output;
}

public int[] FisherYatesBad(int[] source)
{
    int[] output = source.ToArray();
    for (var i = 0; i < output.Length; i++)
    {
        var j = _random.Next(0, output.Length);
        (output[i], output[j]) = (output[j], output[i]);
    }
    return output;
}

A really subtle different, but enough to cause a massive bias.

Good implementation:

Good Fisher-Yates

Bad implementation:

Bad Fisher-Yates

To be clear about these plots, I start with the numbers 0 to 99, create 10_000_000 shuffles using whichever algorithm and then I average the values in each of the shuffles to get a single set of figures. If the shuffle is trying random then all 100 figures would belong to the same normal distribution.

Now, that's all fine, but I thought I'd check to see if these methods produce valid results:

public int[] OrderByRandomNext(int[] source) => source.OrderBy(x => _random.Next()).ToArray();

public int[] OrderByRandomNextDouble(int[] source) => source.OrderBy(x => _random.NextDouble()).ToArray();

Both nice a neat, but are they a fair shuffle?

OrderByRandomNext:

OrderByRandomNext

OrderByRandomNextDouble:

OrderByRandomNextDouble

Notice that the 1 and the 100 figures are significantly lower in each?

Well, I thought it might be an artefact of how OrderBy works. So I tested it with another random number generator - one brought to use from Eric Lippert in his improving Random series.

public int[] OrderByBetterRandomNextDouble(int[] source) => source.OrderBy(x => BetterRandom.NextDouble()).ToArray();

public static class BetterRandom
{
    private static readonly ThreadLocal<RandomNumberGenerator> crng =
        new ThreadLocal<RandomNumberGenerator>(RandomNumberGenerator.Create);

    private static readonly ThreadLocal<byte[]> bytes =
        new ThreadLocal<byte[]>(() => new byte[sizeof(int)]);

    public static int NextInt()
    {
        crng.Value.GetBytes(bytes.Value);
        return BitConverter.ToInt32(bytes.Value, 0) & int.MaxValue;
    }

    public static double NextDouble()
    {
        while (true)
        {
            long x = NextInt() & 0x001FFFFF;
            x <<= 31;
            x |= (long)NextInt();
            double n = x;
            const double d = 1L << 52;
            double q = n / d;
            if (q != 1.0)
                return q;
        }
    }
}

Well, here's the chart:

BetterRandom

There's no bias!

Here's my code to generate the data (run in LINQPad):

void Main()
{
    var n = 100;
    var s = 1000000;

    var numbers = Enumerable.Range(0, n).ToArray();

    var algorithms = new Func<int[], int[]>[]
    {
        FisherYates,
        OrderByRandomNext,
        OrderByRandomNextDouble,
        OrderByBetterRandomNextDouble,
    };

    var averages =
        algorithms
            .Select(algorithm =>
                Enumerable
                    .Range(0, numbers.Length)
                    .Select(x =>
                        Enumerable
                            .Range(0, s)
                            .Select(y => algorithm(numbers))
                            .Aggregate(0.0, (a, v) => a + (double)v[x] / s))
                    .ToArray())
            .Select(x => new
            {
                averages = x,
                distribution = Accord.Statistics.Distributions.Univariate.NormalDistribution.Estimate(x.Skip(1).SkipLast(1).ToArray()),
                first = x.First(),
                last = x.Last(),
            })
            .Select(x => new
            {
                x.averages,
                x.distribution,
                x.first,
                x.last,
                first_prob =x.distribution.DistributionFunction(x.first),
                last_prob = x.distribution.DistributionFunction(x.last),
            })
            .ToArray();

    var d = 

    averages.Dump();
}

private Random _random = new Random();

    public int[] FisherYates(int[] source)
    {
        int[] output = source.ToArray();
        for (var i = 0; i < output.Length; i++)
        {
            var j = _random.Next(i, output.Length);
            (output[i], output[j]) = (output[j], output[i]);
        }
        return output;
    }

public int[] OrderByRandomNext(int[] source) => source.OrderBy(x => _random.Next()).ToArray();

public int[] OrderByRandomNextDouble(int[] source) => source.OrderBy(x => _random.NextDouble()).ToArray();

    public int[] OrderByBetterRandomNextDouble(int[] source) => source.OrderBy(x => BetterRandom.NextDouble()).ToArray();

    public static class BetterRandom
    {
        private static readonly ThreadLocal<RandomNumberGenerator> crng =
            new ThreadLocal<RandomNumberGenerator>(RandomNumberGenerator.Create);

        private static readonly ThreadLocal<byte[]> bytes =
            new ThreadLocal<byte[]>(() => new byte[sizeof(int)]);

        public static int NextInt()
        {
            crng.Value.GetBytes(bytes.Value);
            return BitConverter.ToInt32(bytes.Value, 0) & int.MaxValue;
        }

        public static double NextDouble()
        {
            while (true)
            {
                long x = NextInt() & 0x001FFFFF;
                x <<= 31;
                x |= (long)NextInt();
                double n = x;
                const double d = 1L << 52;
                double q = n / d;
                if (q != 1.0)
                    return q;
            }
        }
    }

Here's the data that I generated:

distribution                                             | first              | last               | first_prob             | last_prob            
-------------------------------------------------------- | ------------------ | ------------------ | ---------------------- | ---------------------
N(x; μ = 49.50267467345823, σ2 = 0.0008896228453062147)  | 49.505465999987585 | 49.49833699998965  | 0.5372807100387846     | 0.44218570467529394  
N(x; μ = 49.50503062243786, σ2 = 0.0009954477334487531)  | 49.36330799998817  | 49.37124399998651  | 3.529550818615057E-06  | 1.115772521409486E-05
N(x; μ = 49.505720877539765, σ2 = 0.0008257970106087029) | 49.37231699998847  | 49.386660999990106 | 1.7228855271333998E-06 | 1.712972513601141E-05
N(x; μ = 49.49994663264188, σ2 = 0.0007518765247716318)  | 49.50191999998847  | 49.474235999989205 | 0.5286859991636343     | 0.17421285127499514  

Here's my question. What's up with System.Random and the bias it introduces?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The default RNG in .NET up to (and including) .NET 5 has known bias and performance issues, most documented here https://github.com/dotnet/runtime/issues/23198:

  • A typo in the implementation of Donald E. Knuth's subtractive random number generator with unknown practical effects.
  • A different modulo (2^32-1 instead of a power of two) with unknown practical effects.
  • Next(0, int.MaxValue) has heavy bias.
  • NextDouble() only produces 2^31 possible values, where it could pick from approx. 2^62 distinct values.

This is why .NET 6 implemented a better algorithm (xoshiro256**). You will get this better RNG when you instantiate a new Random() instance without a seed. This is described in https://github.com/dotnet/runtime/pull/47085. Unfortunately, it's not easy to replace the old RNG when a seed is provided since people might rely on the behavior of the current, biased RNG.

Even though xoshiro256** has some documented flaws (and a rebuttal) as well, I found it to work pretty well for my purposes. I have copied the improved implementation from .NET 6 and use that.

Side-note: LINQ queries are lazily evaluated (a.k.a. "deferred execution"). If you use a RNG in the .OrderBy lambda, you might get confusing results if you iterate multiple times, because the order might change every time. Some sorting algorithms rely on the fact that elements won't suddenly change their relative order to work correctly. Returning inconsistent ordering values will break such sorting algorithms. Sure, todays OrderBy implementation in LINQ-to-Objects works fine, but there's no documented guarantee that it has to work with "randomly"-changing values. A reasonable alternative is .OrderBy(e => HashCode.Combine(0x1337, e)).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...