Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
197 views
in Technique[技术] by (71.8m points)

python - efficiently generating unique positive and negative triplets from two lists (source list and label list)

I am quite new to python and I would like to generate all unique triplets from two lists. So, I have two lists like so:

source_list=[1,2,3,4,5,6,7,8..] # size N
labels_list=[1,1,0,2,1,3,6..] # size N

and I would like triplets of form:

<anchor> <postive> <negative>

where, <anchor> can be any element in the source_list, postive means that this element should come from the same label as the anchor(so, 1,2 and 1,5 could be seen as postive pairs in the triplet) and negative's mean that they should be from a different label (so, 1,3 and 1,4 would be considered a negative pair of the triplet). So, some examples of the correct triplet would be:

1, 2, 3
1, 2, 4
1, 2, 6

From the outset it seems like a N choose k problem, but I am not sure how to approach this with minimal computational cost other than doing loops with each element in python.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There might be a faster way to do this, but we can work it out pretty well with numpy and itertools. To get started, for each unique label k we need to partition source_list into the entries that correspond to label k (the "positive" part) and those that don't (the "negative" part).

>>> source_list = [1,2,3,4,5,6,7,8] #... size N
>>> labels_list = [1,1,0,2,1,3,6,2] #... size N

>>> for key in set(labels_list): 
...     positive = [s for i,s in enumerate(source_list) if labels_list[i] == key] 
...     negative = [s for i,s in enumerate(source_list) if labels_list[i] != key] 
...     print(key, positive, negative) 
...                                                                                 
0 [3] [1, 2, 4, 5, 6, 7, 8]
1 [1, 2, 5] [3, 4, 6, 7, 8]
2 [4, 8] [1, 2, 3, 5, 6, 7]
3 [6] [1, 2, 3, 4, 5, 7, 8]
6 [7] [1, 2, 3, 4, 5, 6, 8]

We can speed this up a little with NumPy and masking, which essentially lets us avoid doing the list comprehension twice.

>>> import numpy as np

>>> source_array = np.array([1,2,3,4,5,6,7,8])
>>> labels_array = np.array([1,1,0,2,1,3,6,2])  

>>> for key in np.unique(labels_array): 
...     mask = (labels_array == key)
...     positive = source_array[mask]
...     negative = source_array[~mask]   # "not" mask
...     print(key, positive, negative) 
...
0 [3] [1 2 4 5 6 7 8]
1 [1 2 5] [3 4 6 7 8]
2 [4 8] [1 2 3 5 6 7]
3 [6] [1 2 3 4 5 7 8]
6 [7] [1 2 3 4 5 6 8]

Side note: If <anchor> and <positive> aren't allowed to represent the same entry in sources_list, then we need each positive group to have at least two members in it. That probably happens in your actual data, but we'll just skip cases with a single positive for this demo.

Now comes the itertools. We want all unique permutations of 2 entries from the positive list, with another entry from the negative list. Since the results must be unique, we can cut down on complexity slightly be removing duplicates from each positive and negative list.

>>> import itertools
>>> import numpy as np

>>> source_array = np.array([1,2,3,4,5,6,7,8])
>>> labels_array = np.array([1,1,0,2,1,3,6,2])  

>>> for key in np.unique(labels_array): 
...     mask = (labels_array == key)
...     positive = np.unique(source_array[mask])  # No duplicates
...     if len(positive) < 2:                     # Skip singleton positives
...         continue
...     negative = np.unique(source_array[~mask]) # No duplicates
...     print(key, positive, negative) 
...     for ((a,b),c) in itertools.product(itertools.permutations(positive, 2),
...                                        negative):
...         print(a,b,c)
...     print()

Output:

1 [1 2 5] [3 4 6 7 8]
1 2 3
1 2 4
1 2 6
1 2 7
1 2 8
1 5 3
1 5 4
1 5 6
1 5 7
1 5 8
2 1 3
2 1 4
2 1 6
2 1 7
2 1 8
2 5 3
2 5 4
2 5 6
2 5 7
2 5 8
5 1 3
5 1 4
5 1 6
5 1 7
5 1 8
5 2 3
5 2 4
5 2 6
5 2 7
5 2 8

2 [4 8] [1 2 3 5 6 7]
4 8 1
4 8 2
4 8 3
4 8 5
4 8 6
4 8 7
8 4 1
8 4 2
8 4 3
8 4 5
8 4 6
8 4 7

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...