Python - Finding word frequencies of list of words in text file

Question

Welcome To Ask or Share your Answers For Others

Python - Finding word frequencies of list of words in text file

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python - Finding word frequencies of list of words in text file

I am trying to speed up my project to count word frequencies. I have 360+ text files, and I need to get the total number of words and the number of times each word from another list of words appears. I know how to do this with a single text file.

>>> import nltk
>>> import os
>>> os.chdir("C:UsersCameronDesktopPDF-to-txt")
>>> filename="1976.03.txt"
>>> textfile=open(filename,"r")
>>> inputString=textfile.read()
>>> word_list=re.split('s+',file(filename).read().lower())
>>> print 'Words in text:', len(word_list)
#spits out number of words in the textfile
>>> word_list.count('inflation')
#spits out number of times 'inflation' occurs in the textfile
>>>word_list.count('jobs')
>>>word_list.count('output')

Its too tedious to get the frequencies of 'inflation', 'jobs', 'output' individual. Can I put these words into a list and find the frequency of all the words in the list at the same time? Basically this with Python.

Example: Instead of this:

>>> word_list.count('inflation')
3
>>> word_list.count('jobs')
5
>>> word_list.count('output')
1

I want to do this (I know this isn't real code, this is what I'm asking for help on):

>>> list1='inflation', 'jobs', 'output'
>>>word_list.count(list1)
'inflation', 'jobs', 'output'
3, 5, 1

My list of words is going to have 10-20 terms, so I need to be able to just point Python toward a list of words to get the counts of. It would also be nice if the output was able to be copy+paste into an excel spreadsheet with the words as columns and frequencies as rows

Example:

inflation, jobs, output
3, 5, 1

And finally, can anyone help automate this for all of the textfiles? I figure I just point Python toward the folder and it can do the above word counting from the new list for each of the 360+ text files. Seems easy enough, but I'm a bit stuck. Any help?

An output like this would be fantastic: Filename1 inflation, jobs, output 3, 5, 1

Filename2
inflation, jobs, output
7, 2, 4

Filename3
inflation, jobs, output
9, 3, 5

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

[9] js正则的问题

[10] 为什么iphone和ipad应用运行在mac上需要芯片的支持？

深蓝 · Answer 1 · 2021-10-17T00:34:59+0000

collections.Counter() has this covered if I understand your problem.

The example from the docs would seem to match your problem.

# Tally occurrences of words in a list
cnt = Counter()
for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    cnt[word] += 1
print cnt


# Find the ten most common words in Hamlet
import re
words = re.findall('w+', open('hamlet.txt').read().lower())
Counter(words).most_common(10)

From the example above you should be able to do:

import re
import collections
words = re.findall('w+', open('1976.03.txt').read().lower())
print collections.Counter(words)

EDIT naive approach to show one way.

wanted = "fish chips steak"
cnt = Counter()
words = re.findall('w+', open('1976.03.txt').read().lower())
for word in words:
    if word in wanted:
        cnt[word] += 1
print cnt

Categories

Python - Finding word frequencies of list of words in text file

Python - Finding word frequencies of list of words in text file

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags