Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
561 views
in Technique[技术] by (71.8m points)

python - Count frequency of item in a list of tuples

I have a list of tuples as shown below. I have to count how many items have a number greater than 1. The code that I have written so far is very slow. Even if there are around 10K tuples, if you see below example string appears two times, so i have to get such kind of strings. My question is what is the best way to achieve the count of strings here by iterating over the generator

List:

 b_data=[('example',123),('example-one',456),('example',987),.....]

My code so far:

blockslst=[]
for line in b_data:
    blockslst.append(line[0])

blocklstgtone=[]
for item in blockslst:
    if(blockslst.count(item)>1):
        blocklstgtone.append(item)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You've got the right idea extracting the first item from each tuple. You can make your code more concise using a list/generator comprehension, as I show you below.

From that point on, the most idiomatic manner to find frequency counts of elements is using a collections.Counter object.

  1. Extract the first elements from your list of tuples (using a comprehension)
  2. Pass this to Counter
  3. Query count of example
from collections import Counter

counts = Counter(x[0] for x in b_data)
print(counts['example'])

Sure, you can use list.count if it’s only one item you want to find frequency counts for, but in the general case, a Counter is the way to go.


The advantage of a Counter is it performs frequency counts of all elements (not just example) in linear (O(N)) time. Say you also wanted to query the count of another element, say foo. That would be done with -

print(counts['foo'])

If 'foo' doesn’t exist in the list, 0 is returned.

If you want to find the most common elements, call counts.most_common -

print(counts.most_common(n))

Where n is the number of elements you want to display. If you want to see everything, don't pass n.


To retrieve counts of most common elements, one efficient way to do this is to query most_common and then extract all elements with counts over 1, efficiently with itertools.

from itertools import takewhile

l = [1, 1, 2, 2, 3, 3, 1, 1, 5, 4, 6, 7, 7, 8, 3, 3, 2, 1]
c = Counter(l)

list(takewhile(lambda x: x[-1] > 1, c.most_common()))
[(1, 5), (3, 4), (2, 3), (7, 2)]

(OP edit) Alternatively, use a list comprehension to get a list of items having count > 1 -

[item[0] for item in counts.most_common() if item[-1] > 1]

Keep in mind that this isn’t as efficient as the itertools.takewhile solution. For example, if you have one item with count > 1, and a million items with count equal to 1, you’d end up iterating over the list a million and one times, when you don’t have to (because most_common returns frequency counts in descending order). With takewhile that isn’t the case, because you stop iterating as soon as the condition of count > 1 becomes false.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...