I want to try to write a general-purpose answer here in the hope of having a good canonical target for duplicate questions in the future.
Combinatoric Fundamentals in Python with itertools
Reordering and Replacement (Repetition)
It's important to understand how the various combinatoric iterators provided by itertools
work and how they differ.
Let's consider a simple candidate set A = [1, 2, 3, 4]
, from which we want to draw "combinations" (as question-askers will usually put it) of three elements.
>>> A = [1,2,3,4]
itertools.combinations
selects with no reordering (i.e., each output value will appear in the same order as in the input) and no repetition of the result values. This therefore produces only 4 results:
>>> list(itertools.combinations(A, 3))
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
itertools.permutations
means that the results can appear in any order, i.e. a given subset of the input will appear multiple times in the output in different orders.
>>> list(itertools.permutations(A, 3)) # 24 results omitted for brevity
The four combinations
appear in six orders each, for 24 total results.
itertools.combinations_with_replacement
selects without reordering, but allows elements from the input to be chosen more than once:
>>> list(itertools.combinations_with_replacement(A, 3)) # 20 results omitted for brevity
There are four results where each element is chosen three times, six where a double is followed by a single (which must be higher), six where a single is followed by a double, plus the four combinations
of all singles. Counting the results for this in the general case is not easy.
If you want to allow repetitions and reordering in your output results, you can use itertools.product
:
>>> list(itertools.product(A, repeat=3)) # 64 results omitted for brevity
Simply, each of three times we freely choose from the four elements, for 4 * 4 * 4 = 64 results.
itertools.product
implements what is called the Cartesian product. In general, it pulls one element each from multiple specified sequences. But generating "permutations with replacement" is the same thing as pulling one element each from the same sequence multiple times, so the repeat
keyword is offered as a convenient shortcut for this operation - so you don't have to specify the sequence multiple times. I mean, you could write itertools.product(*[A]*3)
, but that's ugly.
What about repetition in the candidate set?
This isn't related to OP's question as asked; but for completeness, it's important to understand that none of these functions care about elements of the candidate set being equal, or even identical:
>>> x = object()
>>> candidates = [x, x, x, x]
>>> results = list(itertools.combinations(candidates, 3))
>>> len(results)
4
>>> results[0] == results[1] == results[2] == results[3]
True
How can we implement constraints?
The simplest way is to generate an inclusive result (in OP's case, by joining the a
and b
candidates together, and generating a product
of 10 of those) and filter out things that don't meet the requirements. However, this is inefficient and can be ugly (we need to analyze an output tuple to figure out whether its elements meet the conditions - in OP's case, whether they were drawn from a
or from b
. If you do want to take an approach like this, I generally recommend writing a generator function:
def OP_problem():
for result in itertools.product(a+b, repeat=10):
a_count = len(x for x in result if x in a)
# the trick is that every element was either from a or b;
# so "at least 3 a's and at least 3 b's" is equivalent to
# "at least 3 a's and at most 7 a's".
if 3 <= a_count <= 7:
yield result
or in simple enough cases, a generator expression:
(
# maybe you don't think this is simple enough :)
result for result in itertools.product(a+b, repeat=10)
if 3 <= len(x for x in result if x in a) <= 7
)
Usually it's better to try to break the problem down into smaller pieces and put the results together. For example, the documentation has a recipe for computing power sets that simply chains the results for combinations of 0 elements, 1 element etc. up to all the elements. (Another way is to find the cartesian product of N booleans, each representing whether or not to include a given element, and then translate them into subsets.)
In our case, we can separately find the results for each count of a
elements. Let's consider the case of 5 elements from each list; it should be clear how to generalize that and combine the results (hint: use itertools.chain.from_iterable
, as shown in the powerset
recipe in the documentation).
Difficult cases: partitions and position selection
There's one advanced technique I want to showcase here, in order to solve the problem of selecting 5 elements from a
and 5 elements from b
and intermingling them. The idea is simply that we select positions where the a's will go, out of all the possible positions (i.e., indices into a sequence of 10 elements), and for each, generate the corresponding output results. Those positions are combinations without replacement (do you understand why?) of the possible index values.
Thus, something like:
def make_combination(letter_positions, chosen_letters, chosen_digits):
result = [None] * 10
for letter, position in zip(chosen_letters, letter_positions):
result[position] = letter
# Figure out where the digits go, using set arithmetic to find the
# remaining positions, then putting them back in ascending order.
digit_positions = sorted(set(range(10)) - set(chosen_letters))
for digit, position in zip(chosen_digits, digit_positions):
result[position] = digit
assert None not in result
return tuple(result)
def five_letters_and_five_digits():
letters = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'
# It's not *easy*, but it's fairly elegant.
# We separately generate the letter positions, letter selection
# and digit selection, using `product` to get the cartesian product
# of *those* possibilities; then for each of those, we translate
# into a desired output - using `starmap` to iterate.
return itertools.starmap(
make_combination,
itertools.product(
itertools.combinations(range(10), 5),
itertools.product(letters, repeat=5),
itertools.product(digits, repeat=5)
)
)
The technique of choosing positions is also useful for solving partitioning problems. The idea is simply that you choose positions where the partitions go (for N elements there will generally be N-1 places to put them), as combinations - either with (if zero-size partitions are allowed) or without (otherwise) replacement.