Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

python - Getting unique combinations from a non-unique list of items, FASTER?

First off, I am able to do it but I am not happy with the speed.

My question is, Is there a better, faster way of doing this?

I have a list of items looking like this:

[(1,2), (1,2), (4,3), (7,8)]

And I need to get all the unique combinations. For example, the unique combinations of 2 items would be:

[(1,2), (1,2)], [(1,2), (4,3)], [(1,2), (7,8)], [(4,3), (7,8)]

After using itertools.combinations I get a lot more than that because of duplicates. For example, I get every list containing (1,2) twice. If I create a set of these combinations I get the unique ones. The problem comes when the original list has 80 tuples and I want combinations with 6 items in them. Getting that set takes more than 30 seconds. If I can get that number down I would be very happy.

I am aware that the number of combinations is huge and that's why creating the set is time-consuming. But I am still hoping that there is a library that has optimized the process in some way, speeding it up a bit.

It might be important to note that from all the combinations I find I test out only the first 10000 or so. Because in some cases all combos can be waay too much to process so I don't really want to spend too much time on them as there are other tests to be done too.

This is a sample of what I have now:

from itertools import combinations

ls = [list of random NON-unique sets (x,y)]
# ls = [(1,2), (1,2), (4,3), (7,8)]  # example
# in the second code snipped it is shown how I generate ls for testing

all_combos = combinations(ls, 6)
all_combos_set = set(all_combos)

for combo in all_combos_set:
  do_some_test_on(combo)

In case you want to test it out .. here is what I use for testing the speed of different methods:

def main3():
    tries = 4
    elements_in_combo = 6
    rng = 90
    data = [0]*rng
    for tr in range(tries):
        for n in range(1, rng):
            quantity = 0
            name = (0,0)
            ls = []
            for i in range(n):
                if quantity == 0:
                    quantity = int(abs(gauss(0, 4)))
                    if quantity != 0:
                        quantity -= 1
                    name = (randint(1000,7000), randint(1000,7000))
                    ls.append(name)
                else:
                    quantity -= 1
                    ls.append(name)

            start_time = time.time()
            all_combos = combinations(ls, elements_in_combo)
            all_combos = set(all_combos)

            duration = time.time() - start_time
            data[n] += duration
            print(n, "random files take", duration, "seconds.")

            if duration > 30:
                break

    for i in range(rng):
        print("average duration for", i, "is", (data[i]/tries), "seconds.")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The originally asked question "is there a better, faster way of doing this?" has actually two questions in it:

  • Is there a faster way?
  • Is there a better way?

I would like to narrow the answer to the question "Is there a faster way?" to:

Is there a FASTER way of removing duplicates from a list as doing it as follows:

lstWithUniqueElements = list(set(lstWithDuplicates))

?

To my knowledge, there is no faster way ...

Now let's concentrate more on the second part of the question ( "Is there a better way?" ). It is usually very hard and needs much discussion to answer such kind of question, but it will be not the case here, because what a better way is, was already clearly stated by the author of the question (citation):

I'd love to use a generator function. The itertools combinations() itself is an iterable and not a list or set, so if I figure out how to yield the unique combinations that'd be great.

So HERE it is :

def uniqueCombinations(lstList, comboSize): 
    from itertools import combinations
    lstList.sort()
    allCombos = combinations(lstList, comboSize)
    setUniqueCombos = set()
    for comboCandidate in allCombos:
        if comboCandidate in setUniqueCombos:
            continue
        yield comboCandidate
        setUniqueCombos.add(comboCandidate)

That's it ...


One more thing is maybe worth to mention here. The by the author of the question chosen method of getting unique combinations in case the list they are generated from has not only unique but also multiple elements with same value doesn't work in some special cases like this one:
set(combinations(['a','a','b','a'], 2)) gives: {('a', 'b'), ('b', 'a'), ('a', 'a')}
uniqueCombinations(['a','a','b','a'],2) gives: {('a', 'b'), ('a', 'a')}

In between there is a pure Python function available here on stackoverflow which is both faster and slower as this one provided above. How can it be faster AND slower? See HERE for details.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...