Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
292 views
in Technique[技术] by (71.8m points)

python - Count of Specific words in Multiple text files

I have a multiple text files and I need to find and cound specific words in those files and write them in a csv file. Column A contains the txt file names and in the header the words and for each file name its count. With this code I am getting all the words and need to filter out exact words

for example the output should be like the image file I uploaded

header = ['Abuse', 'Accommodating', 'Accommodation', 'Accountability']

import csv
folderpaths = 'C:/Users/haris/Downloads/PDF/'
counter = Counter()
filepaths = glob(os.path.join(folderpaths,'*.txt'))
for file in filepaths:
    with open(file) as f:
        words = re.findall(r'w+', f.read().lower())
        counter = counter + Counter(words)
    print(counter)
f = open('C:/Users/haris/Downloads/PDF/firstcsv.csv', 'w')
writer = csv.writer(f)
for row in counter.items():
    writer.writerow(row)

enter image description here

Files uploaded to google drive

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Edit: As per your new request, I have added the "total_words" column. The code has been updated.

enter image description here


Below is a code that works. Just change the "folderpath" variable to the path of the folder with the text files, and change the "target_file" variable to where you want the output csv file to be created.

Sample csv output:

enter image description here

Code:

from collections import Counter
import glob
import os
import re

header = ['annual', 'investment', 'statement', 'range' , 'deposit' , 'supercalifragilisticexpialidocious']
folderpath = r'C:UsersUSERname4Desktopmyfolder'
target_file = r'C:UsersUSERname4Desktopmycsv.csv'

queueWAP = []
def writeAndPrint(fileObject,toBeWAP,opCode=0):
    global queueWAP
    if (opCode == 0):
        fileObject.write(toBeWAP)
        print(toBeWAP)
    if (opCode == 1):
        queueWAP.append(toBeWAP)
    if (opCode == 2):
        for temp4 in range(len(queueWAP)):
            fileObject.write(queueWAP[temp4])
            print(queueWAP[temp4])
        queueWAP = []
mycsvfile = open(target_file, 'w')
writeAndPrint(mycsvfile,"file_name,total_words")
for temp1 in header:
    writeAndPrint(mycsvfile,","+temp1)
writeAndPrint(mycsvfile,"
")
filepaths = glob.glob(folderpath + r"*.txt")
for file in filepaths:
    with open(file) as f:
        writeAndPrint(mycsvfile,file.split("\")[-1])
        counter = Counter()
        words = re.findall(r'w+', f.read().lower())
        counter = counter + Counter(words)
        for temp2 in header:
            temp3 = False
            temp5 = 0
            for myword in counter.items():
                temp5 = temp5 + 1
                if myword[0] == temp2:
                    writeAndPrint(mycsvfile,","+str(myword[1]),1)
                    temp3 = True
            if temp3 == False:
                writeAndPrint(mycsvfile,","+"0",1)
        writeAndPrint(mycsvfile,","+str(temp5))
        writeAndPrint(mycsvfile,"",2)
        writeAndPrint(mycsvfile,"
")
mycsvfile.close()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...