Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
213 views
in Technique[技术] by (71.8m points)

python - Efficiently calculate word frequency in a string

I am parsing a long string of text and calculating the number of times each word occurs in Python. I have a function that works but I am looking for advice on whether there are ways I can make it more efficient(in terms of speed) and whether there's even python library functions that could do this for me so I'm not reinventing the wheel?

Can you suggest a more efficient way to calculate the most common words that occur in a long string(usually over 1000 words in the string)?

Also whats the best way to sort the dictionary into a list where the 1st element is the most common word, the 2nd element is the 2nd most common word and etc?

test = """abc def-ghi jkl abc
abc"""

def calculate_word_frequency(s):
    # Post: return a list of words ordered from the most
    # frequent to the least frequent

    words = s.split()
    freq  = {}
    for word in words:
        if freq.has_key(word):
            freq[word] += 1
        else:
            freq[word] = 1
    return sort(freq)

def sort(d):
    # Post: sort dictionary d into list of words ordered
    # from highest freq to lowest freq
    # eg: For {"the": 3, "a": 9, "abc": 2} should be
    # sorted into the following list ["a","the","abc"]

    #I have never used lambda's so I'm not sure this is correct
    return d.sort(cmp = lambda x,y: cmp(d[x],d[y]))

print calculate_word_frequency(test)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use collections.Counter:

>>> from collections import Counter
>>> test = 'abc def abc def zzz zzz'
>>> Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...