I am trying to modify my script to copy files across using multiprocessing as an exercise for me learn more about multiprocessing in python
my main does this
if __name__ == "__main__":
#get command line arguments
cmdlineArgs = getCmdLineArguments()
#get all the files in folder
listOfFiles = getFiles(cmdlineArgs.sourceDirectory)
#create dataframe of files which needs to be copied
filesDF = createDF(listOfFiles, cmdlineArgs.destDirectory)
processes = []
lstOfDates = list(set(filesDF['date'].to_list()))
lstOfDates.sort()
# for dt in lstOfDates:
# copyFilesAcross([filesDF, [dt]])
splitListOfDatesForProc = [(lstOfDates[i:i+3]) for i in range(0, len(lstOfDates), 3)]
for dt in splitListOfDatesForProc:
p = Process(target=copyFilesAcross, args=([filesDF, dt],))
processes.append(p)
p.start()
for p in processes:
p.join()
copyFilesAcross does this :
def copyFilesAcross(lst):
#keep only the date provided as parameter
df = lst[0]
dt = lst[1]
for d in dt:
df = df[df.date == d]
print("Processing date " + d + ' for PID: ', os.getpid())
for index,row in df.iterrows():
try:
#print('Making directory ' + row['destination'])
os.makedirs(row['destination'], exist_ok=True)
shutil.copy(row['source'], row['destination'])
except OSError as e:
print('Failed to copy file ' + row['source'] + ' with error {0}'.format(e) )
except:
print("Unexpected error: ", sys.exc_info()[0])
output :
getFiles: Executed ...
getFiles: Creating empty list ...
getFiles: Concatenating files ...
Creating dataframe of files to be copied ...
Creating empty dataframe ...
Populating dataframe ...
Sorting data frame by date ...
Processing date 20180204 for PID: 35033 <- processed
Processing date 20180304 for PID: 35034 <- processed
Processing date 20180811 for PID: 35038 <- processed
Processing date 20180815 for PID: 35041 <- processed
Processing date 20180311 for PID: 35034 <- not processed
Processing date 20180724 for PID: 35034 <- not processed
Processing date 20180222 for PID: 35033 <- not processed
Processing date 20180303 for PID: 35033 <- not processed
Processing date 20180812 for PID: 35038 <- not processed
Processing date 20180813 for PID: 35038 <- not processed
Process finished with exit code 0
Without multiprocessing the script runs fine so I assume the issue is in the last 2 for loops in main, but I am not sure what I am doing wrong.
question from:
https://stackoverflow.com/questions/66056578/why-is-my-multiprocessing-enabled-script-creating-only-4-out-of-10-folders 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…