Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
779 views
in Technique[技术] by (71.8m points)

python - all permutations from 2 lists but with a condition on the amount of elements

In Python I have :

a = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

I'm trying to build all possible permutations of 10 of these characters selected from list a or b, that would include any permutation containing: at least 3 elements from a and at least 3 elements from b. The elements can be repeated, as they have been selected from the lists.

e.g, of valid permutations: wa85ff3d56, 333aaaaaaa, a33333aa33

What's the most pythonic, efficient way of doing this?

PS: Yes, I know, The answer will be extremely large. It is expected. It will not be stored in memory, it will be streamed and used in live calculations.

For now, I generate all the possible combinaisons and then permutate them. However, the calculation will most definitely take months. I have this piece of code, which does the trick, but it's clearly not the most efficient way. Example for 3 from list a and 7 from list b, it generates 2.864429568e+14 permutations. So considering i'd have to do this 5 times (3,7), (4,6), (5,5), (6,4), (7,3), I'd get a total of 1.432214784e+15 permutations. In comparison, without restrictions, there would be 36^10 = 3.65615844e+15. So this method would remove almost half of the possible permutations.

import string
import itertools


a = list(string.digits)
b = list(string.ascii_lowercase)

a_combinaisons_3 = list(itertools.combinations(a, 3))
b_combinaisons_7 = list(itertools.combinations(b, 7))

a3b7_combinaisons = []

for a_element in a_combinaisons_3:
    for b_element in b_combinaisons_7:
        a3b7_combinaisons.append(a_element + b_element)

a3b7_permutations = list(itertools.permutations(a3b7_combinaisons))
print(len(a3b7_combinaisons))  # 78936000
print(len(a3b7_permutations))  # 1.432214784e+15
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I want to try to write a general-purpose answer here in the hope of having a good canonical target for duplicate questions in the future.

Combinatoric Fundamentals in Python with itertools

Reordering and Replacement (Repetition)

It's important to understand how the various combinatoric iterators provided by itertools work and how they differ.

Let's consider a simple candidate set A = [1, 2, 3, 4], from which we want to draw "combinations" (as question-askers will usually put it) of three elements.

>>> A = [1,2,3,4]

itertools.combinations selects with no reordering (i.e., each output value will appear in the same order as in the input) and no repetition of the result values. This therefore produces only 4 results:

>>> list(itertools.combinations(A, 3))
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]

itertools.permutations means that the results can appear in any order, i.e. a given subset of the input will appear multiple times in the output in different orders.

>>> list(itertools.permutations(A, 3)) # 24 results omitted for brevity

The four combinations appear in six orders each, for 24 total results.

itertools.combinations_with_replacement selects without reordering, but allows elements from the input to be chosen more than once:

>>> list(itertools.combinations_with_replacement(A, 3)) # 20 results omitted for brevity

There are four results where each element is chosen three times, six where a double is followed by a single (which must be higher), six where a single is followed by a double, plus the four combinations of all singles. Counting the results for this in the general case is not easy.

If you want to allow repetitions and reordering in your output results, you can use itertools.product:

>>> list(itertools.product(A, repeat=3)) # 64 results omitted for brevity

Simply, each of three times we freely choose from the four elements, for 4 * 4 * 4 = 64 results.

itertools.product implements what is called the Cartesian product. In general, it pulls one element each from multiple specified sequences. But generating "permutations with replacement" is the same thing as pulling one element each from the same sequence multiple times, so the repeat keyword is offered as a convenient shortcut for this operation - so you don't have to specify the sequence multiple times. I mean, you could write itertools.product(*[A]*3), but that's ugly.

What about repetition in the candidate set?

This isn't related to OP's question as asked; but for completeness, it's important to understand that none of these functions care about elements of the candidate set being equal, or even identical:

>>> x = object()
>>> candidates = [x, x, x, x]
>>> results = list(itertools.combinations(candidates, 3))
>>> len(results)
4
>>> results[0] == results[1] == results[2] == results[3]
True

How can we implement constraints?

The simplest way is to generate an inclusive result (in OP's case, by joining the a and b candidates together, and generating a product of 10 of those) and filter out things that don't meet the requirements. However, this is inefficient and can be ugly (we need to analyze an output tuple to figure out whether its elements meet the conditions - in OP's case, whether they were drawn from a or from b. If you do want to take an approach like this, I generally recommend writing a generator function:

def OP_problem():
    for result in itertools.product(a+b, repeat=10):
        a_count = len(x for x in result if x in a)
        # the trick is that every element was either from a or b;
        # so "at least 3 a's and at least 3 b's" is equivalent to
        # "at least 3 a's and at most 7 a's".
        if 3 <= a_count <= 7:
            yield result

or in simple enough cases, a generator expression:

(
    # maybe you don't think this is simple enough :)
    result for result in itertools.product(a+b, repeat=10)
    if 3 <= len(x for x in result if x in a) <= 7
)

Usually it's better to try to break the problem down into smaller pieces and put the results together. For example, the documentation has a recipe for computing power sets that simply chains the results for combinations of 0 elements, 1 element etc. up to all the elements. (Another way is to find the cartesian product of N booleans, each representing whether or not to include a given element, and then translate them into subsets.)

In our case, we can separately find the results for each count of a elements. Let's consider the case of 5 elements from each list; it should be clear how to generalize that and combine the results (hint: use itertools.chain.from_iterable, as shown in the powerset recipe in the documentation).

Difficult cases: partitions and position selection

There's one advanced technique I want to showcase here, in order to solve the problem of selecting 5 elements from a and 5 elements from b and intermingling them. The idea is simply that we select positions where the a's will go, out of all the possible positions (i.e., indices into a sequence of 10 elements), and for each, generate the corresponding output results. Those positions are combinations without replacement (do you understand why?) of the possible index values.

Thus, something like:

def make_combination(letter_positions, chosen_letters, chosen_digits):
    result = [None] * 10
    for letter, position in zip(chosen_letters, letter_positions):
        result[position] = letter
    # Figure out where the digits go, using set arithmetic to find the
    # remaining positions, then putting them back in ascending order.
    digit_positions = sorted(set(range(10)) - set(chosen_letters))
    for digit, position in zip(chosen_digits, digit_positions):
        result[position] = digit
    assert None not in result
    return tuple(result)


def five_letters_and_five_digits():
    letters = 'abcdefghijklmnopqrstuvwxyz'
    digits = '0123456789'
    # It's not *easy*, but it's fairly elegant.
    # We separately generate the letter positions, letter selection
    # and digit selection, using `product` to get the cartesian product
    # of *those* possibilities; then for each of those, we translate
    # into a desired output - using `starmap` to iterate.
    return itertools.starmap(
        make_combination, 
        itertools.product(
            itertools.combinations(range(10), 5),
            itertools.product(letters, repeat=5),
            itertools.product(digits, repeat=5)
        )
    )
                

The technique of choosing positions is also useful for solving partitioning problems. The idea is simply that you choose positions where the partitions go (for N elements there will generally be N-1 places to put them), as combinations - either with (if zero-size partitions are allowed) or without (otherwise) replacement.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...