First off, I am able to do it but I am not happy with the speed.
My question is, Is there a better, faster way of doing this?
I have a list of items looking like this:
[(1,2), (1,2), (4,3), (7,8)]
And I need to get all the unique combinations. For example, the unique combinations of 2 items would be:
[(1,2), (1,2)], [(1,2), (4,3)], [(1,2), (7,8)], [(4,3), (7,8)]
After using itertools.combinations I get a lot more than that because of duplicates. For example, I get every list containing (1,2) twice. If I create a set of these combinations I get the unique ones.
The problem comes when the original list has 80 tuples and I want combinations with 6 items in them. Getting that set takes more than 30 seconds. If I can get that number down I would be very happy.
I am aware that the number of combinations is huge and that's why creating the set is time-consuming. But I am still hoping that there is a library that has optimized the process in some way, speeding it up a bit.
It might be important to note that from all the combinations I find I test out only the first 10000 or so. Because in some cases all combos can be waay too much to process so I don't really want to spend too much time on them as there are other tests to be done too.
This is a sample of what I have now:
from itertools import combinations
ls = [list of random NON-unique sets (x,y)]
# ls = [(1,2), (1,2), (4,3), (7,8)] # example
# in the second code snipped it is shown how I generate ls for testing
all_combos = combinations(ls, 6)
all_combos_set = set(all_combos)
for combo in all_combos_set:
do_some_test_on(combo)
In case you want to test it out .. here is what I use for testing the speed of different methods:
def main3():
tries = 4
elements_in_combo = 6
rng = 90
data = [0]*rng
for tr in range(tries):
for n in range(1, rng):
quantity = 0
name = (0,0)
ls = []
for i in range(n):
if quantity == 0:
quantity = int(abs(gauss(0, 4)))
if quantity != 0:
quantity -= 1
name = (randint(1000,7000), randint(1000,7000))
ls.append(name)
else:
quantity -= 1
ls.append(name)
start_time = time.time()
all_combos = combinations(ls, elements_in_combo)
all_combos = set(all_combos)
duration = time.time() - start_time
data[n] += duration
print(n, "random files take", duration, "seconds.")
if duration > 30:
break
for i in range(rng):
print("average duration for", i, "is", (data[i]/tries), "seconds.")
See Question&Answers more detail:
os