Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
316 views
in Technique[技术] by (71.8m points)

performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

I have M number of N-dimensional arrays of tuples and I'd like to find most frequent value in the first elements of the tuples and the second elements, here's a single N-dimen array demo data:

data = [[(2, 0), (0, 3), (0, 2), (0, 3), (2, 4), (0, 3), (0, 3), (2, 7)],
        [(2, 0), (0, 1), (2, 0), (0, 1), (3, 4), (2, 7), (2, 0), (2, 7)],
        [(2, 2), (2, 3), (2, 2), (2, 3), (2, 2), (2, 3), (2, 3), (2, 2)],
        [(2, 1), (2, 1), (3, 2), (2, 1), (2, 1), (3, 3), (2, 1), (2, 1)]]

Here's my current implementation:

from collections import Counter


def find_most_common_values(data):
# Flatten the n-dimensional array
    flattened = []
    for sublist in data:
        for item in sublist:
            flattened.append(item)

    # Separate the elements
    x = [item[0] for item in flattened]
    y = [item[1] for item in flattened]

    c = Counter(x)
    most_common_x = c.most_common(1)[0][0]
    c = Counter(y)
    most_common_y = c.most_common(1)[0][0]

    return most_common_x, most_common_y

# Demo function
def main():
    data = [[(2, 0), (0, 3), (0, 2), (0, 3), (2, 4), (0, 3), (0, 3), (2, 7)],
            [(2, 0), (0, 1), (2, 0), (0, 1), (3, 4), (2, 7), (2, 0), (2, 7)],
            [(2, 2), (2, 3), (2, 2), (2, 3), (2, 2), (2, 3), (2, 3), (2, 2)],
            [(2, 1), (2, 1), (3, 2), (2, 1), (2, 1), (3, 3), (2, 1), (2, 1)]]

    most_common_x, most_common_y = find_most_common_values(data)
    print("Most commont X: " + str(most_common_x))
    print("Most commont Y: " + str(most_common_y))



# Main entry point
if __name__ == "__main__":
    main()

Which correctly outputs the following:

Most commont X: 2
Most commont Y: 3

Since I'm going to utilize this in a for loop with a lot of data I'm trying to implement the fastest approach and since I'm a newbie in Python I guess there are better ways I'm not aware of, so anyone know a faster approach preferably more Pythonic?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a one-liner to achieve this using collections.Counter along with zip and itertools.chain in list comprehension:

from collections import Counter
from itertools import chain

a, b = [Counter(x).most_common(1)[0][0] for x in zip(*chain(*data))]

Output:

>>> a
2
>>> b
3

You can refer below documents to read more about these functions:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...