All the above solutions waste a lot of resources for filtering repeated results when it comes to the end of the iteration. That's why I have thought of a method that has (almost) linear speed from start until the very end.
The idea is: Give (only in your head) each result of the standard order cartesian product an index. That would be for example for A
xB
xC
with 2000
x1
x2
= 4000
elements:
0: (A[0], B[0], C[0])
1: (A[1], B[0], C[0])
...
1999: (A[1999], B[0], C[0])
2000: (A[0], B[0], C[1])
...
3999: (A[1999], B[0], C[1])
done.
So there are still some questions open:
- How do I get a list of possible indices? Answer: Just multiply
2000*1*2=4000
and every number below that will be a valid index.
- How do I generate random indices sequentially without repetition? There are two answers: If you want samples with a known sample size
n
, just use random.sample(xrange(numer_of_indices), n)
. But if you don't know the sample size yet (more general case), you have to generate indices on the fly to not waste memory. In that case, you can just generate index = random.randint(0, k - 1)
with k = numer_of_indices
to get the first index and k = number_of_indices - n
for the n
th result. Just check my code below (be aware, that I use a one sided linked list there to store the done indices. It makes insert operations O(1) operations and we need a lot of insertions here).
- How do I generate the output from the index? Answer: Well, say our index is
i
. Then i % 2000
will be the index of A
for the result. Now i // 2000
can be treated recursively as the index for the cartesian product of the remaining factors.
So this is the code I came up with:
def random_order_cartesian_product(*factors):
amount = functools.reduce(lambda prod, factor: prod * len(factor), factors, 1)
index_linked_list = [None, None]
for max_index in reversed(range(amount)):
index = random.randint(0, max_index)
index_link = index_linked_list
while index_link[1] is not None and index_link[1][0] <= index:
index += 1
index_link = index_link[1]
index_link[1] = [index, index_link[1]]
items = []
for factor in factors:
items.append(factor[index % len(factor)])
index //= len(factor)
yield items
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…