performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

Question

Welcome To Ask or Share your Answers For Others

performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

I have M number of N-dimensional arrays of tuples and I'd like to find most frequent value in the first elements of the tuples and the second elements, here's a single N-dimen array demo data:

data = [[(2, 0), (0, 3), (0, 2), (0, 3), (2, 4), (0, 3), (0, 3), (2, 7)],
        [(2, 0), (0, 1), (2, 0), (0, 1), (3, 4), (2, 7), (2, 0), (2, 7)],
        [(2, 2), (2, 3), (2, 2), (2, 3), (2, 2), (2, 3), (2, 3), (2, 2)],
        [(2, 1), (2, 1), (3, 2), (2, 1), (2, 1), (3, 3), (2, 1), (2, 1)]]

Here's my current implementation:

from collections import Counter


def find_most_common_values(data):
# Flatten the n-dimensional array
    flattened = []
    for sublist in data:
        for item in sublist:
            flattened.append(item)

    # Separate the elements
    x = [item[0] for item in flattened]
    y = [item[1] for item in flattened]

    c = Counter(x)
    most_common_x = c.most_common(1)[0][0]
    c = Counter(y)
    most_common_y = c.most_common(1)[0][0]

    return most_common_x, most_common_y

# Demo function
def main():
    data = [[(2, 0), (0, 3), (0, 2), (0, 3), (2, 4), (0, 3), (0, 3), (2, 7)],
            [(2, 0), (0, 1), (2, 0), (0, 1), (3, 4), (2, 7), (2, 0), (2, 7)],
            [(2, 2), (2, 3), (2, 2), (2, 3), (2, 2), (2, 3), (2, 3), (2, 2)],
            [(2, 1), (2, 1), (3, 2), (2, 1), (2, 1), (3, 3), (2, 1), (2, 1)]]

    most_common_x, most_common_y = find_most_common_values(data)
    print("Most commont X: " + str(most_common_x))
    print("Most commont Y: " + str(most_common_y))



# Main entry point
if __name__ == "__main__":
    main()

Which correctly outputs the following:

Most commont X: 2
Most commont Y: 3

Since I'm going to utilize this in a for loop with a lot of data I'm trying to implement the fastest approach and since I'm a newbie in Python I guess there are better ways I'm not aware of, so anyone know a faster approach preferably more Pythonic?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-19T03:39:53+0000

Here's a one-liner to achieve this using collections.Counter along with zip and itertools.chain in list comprehension:

from collections import Counter
from itertools import chain

a, b = [Counter(x).most_common(1)[0][0] for x in zip(*chain(*data))]

Output:

>>> a
2
>>> b
3

You can refer below documents to read more about these functions:

Categories

performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

performance - Fastest approach to finding the most common first and second value of tuples in an N-dimensional array of tuples in Python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags