I am trying to parse many files found in a directory, however using multiprocessing slows my program.
# Calling my parsing function from Client.
L = getParsedFiles('/home/tony/Lab/slicedFiles') <--- 1000 .txt files found here.
combined ~100MB
Following this example from python documentation:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
I've written this piece of code:
from multiprocessing import Pool
from api.ttypes import *
import gc
import os
def _parse(pathToFile):
myList = []
with open(pathToFile) as f:
for line in f:
s = line.split()
x, y = [int(v) for v in s]
obj = CoresetPoint(x, y)
gc.disable()
myList.append(obj)
gc.enable()
return Points(myList)
def getParsedFiles(pathToFile):
myList = []
p = Pool(2)
for filename in os.listdir(pathToFile):
if filename.endswith(".txt"):
myList.append(filename)
return p.map(_pars, , myList)
I followed the example, put all the names of the files that end with a .txt
in a list, then created Pools, and mapped them to my function. Then I want to return a list of objects. Each object holds the parsed data of a file. However it amazes me that I got the following results:
#Pool 32 ---> ~162(s)
#Pool 16 ---> ~150(s)
#Pool 12 ---> ~142(s)
#Pool 2 ---> ~130(s)
Graph:
Machine specification:
62.8 GiB RAM
Intel? Core? i7-6850K CPU @ 3.60GHz × 12
What am I missing here ?
Thanks in advance!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…