Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
187 views
in Technique[技术] by (71.8m points)

c# - System.OutOfMemoryException when generating permutations

I'm getting System.OutOfMemoryException when trying to generate 6 letter permutations. 5 letter permutations still work.

Here is the code I'm using to generate ALL permutations:

private static List<string> getPermutations(int n,string source)
        {
            IEnumerable<string> q = source.Select(x => x.ToString());
            for (int i = 0; i < n - 1; i++)
            {
                q = q.SelectMany(x => source, (x, y) => x + y);
            }
            return q.ToList(); // THIS IS WHERE THE ERROR HAPPENS
        }

after which I am using this piece of code to filter them based on regex:

private static List<string> filterListByRegex(List<string> list, string regex)
        {
            List<string> newList = list.ToList();
            for (int i = newList.Count - 1; i >= 0; i--)
            {
                Match match = Regex.Match(""+newList[i], regex, RegexOptions.IgnoreCase);
                if (!match.Success)
                {
                    newList.RemoveAt(i);
                }
            }
            return newList;
        }

so since I don't really need ALL those permutations, is there a way to regex filter while generating permutations, or should I use a more efficient piece of code to generate permutations?

Here is a picture to better demonstrate what I'm trying to achieve: enter image description here

The vertical alphabet string is the one I'm telling the code to use.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples or permutations with repetition - Wikipedia.

Second, regarding the System.OutOfMemoryException when generating permutations, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.

In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.

In order to do that, I need to take some hard design decision. The signature of the general function I'm talking about will look like this

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)

and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I've decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this

public static IEnumerable<string> RepeatingPermutations(this string set, int N)
{
    return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p));
}

A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this

from e1 in set
from e2 in set
...
from eN in set
select new [] { e1, e2, .., eN }

Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.

With all that being said, here is the function:

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
{
    var result = new T[N];
    var indices = new int[N];
    for (int pos = 0, index = 0; ;)
    {
        for (; pos < N; pos++, index = 0)
        {
            indices[pos] = index;
            result[pos] = set[index];
        }
        yield return result;
        do
        {
            if (pos == 0) yield break;
            index = indices[--pos] + 1;
        }
        while (index >= set.Length);
    }
}

I've did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:

A : N=2 Count=         676 Time=00:00:00.0000467 Memory=     29K
B1: N=2 Count=         676 Time=00:00:00.0000263 Memory=     16K
B2: N=2 Count=         676 Time=00:00:00.0000189 Memory=      8K

A : N=3 Count=      17,576 Time=00:00:00.0010107 Memory=    657K
B1: N=3 Count=      17,576 Time=00:00:00.0003673 Memory=    344K
B2: N=3 Count=      17,576 Time=00:00:00.0001415 Memory=      8K

A : N=4 Count=     456,976 Time=00:00:00.0184445 Memory=  2,472K
B1: N=4 Count=     456,976 Time=00:00:00.0096189 Memory=  2,520K
B2: N=4 Count=     456,976 Time=00:00:00.0033624 Memory=      8K

A : N=5 Count=  11,881,376 Time=00:00:00.4281349 Memory=    397K
B1: N=5 Count=  11,881,376 Time=00:00:00.2482835 Memory=  4,042K
B2: N=5 Count=  11,881,376 Time=00:00:00.0887759 Memory=      8K

A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory=  1,688K
B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory=  1,024K
B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory=      8K

where

A - LINQ function from @juharr
B1 - my function with string
B2 - my function with char[]

As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.

As expected in such scenario, the non allocating function significantly outperforms them both.

UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):

var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray();
var charPermutations = charSet.RepeatingPermutations(3);
var stringSet = new string(charset);
var stringPermutations = stringSet.RepeatingPermutations(3);

However, remember the design choice I've made, so if you expand the charPermutations inside the debugger, you'll see one and the same values (the last permutation). Consuming the whole result of the above call for char[] should be like this

var charPermutationList = charSet.RepeatingPermutations(3)
    .Select(p => (char[])p.Clone()).ToList();

Actually a good addition to the two methods presented would be the following extension method:

public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source)
{
    return source.Select(item => (T[])item.Clone());
}

so the consuming call would be simple

var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...