Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
81 views
in Technique[技术] by (71.8m points)

python - Why is my multiprocessing-enabled script creating only 4 out of 10 folders?

I am trying to modify my script to copy files across using multiprocessing as an exercise for me learn more about multiprocessing in python

my main does this

if __name__ == "__main__":
    #get command line arguments
    cmdlineArgs = getCmdLineArguments()
    #get all the files in folder
    listOfFiles = getFiles(cmdlineArgs.sourceDirectory)
    #create dataframe of files which needs to be copied
    filesDF = createDF(listOfFiles, cmdlineArgs.destDirectory)
    processes = []
    lstOfDates = list(set(filesDF['date'].to_list()))
    lstOfDates.sort()
    # for dt in lstOfDates:
    #     copyFilesAcross([filesDF, [dt]])
    splitListOfDatesForProc = [(lstOfDates[i:i+3]) for i in range(0, len(lstOfDates), 3)]
    for dt in splitListOfDatesForProc:
        p = Process(target=copyFilesAcross, args=([filesDF, dt],))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

copyFilesAcross does this :

def copyFilesAcross(lst):
    #keep only the date provided as parameter
    df = lst[0]
    dt = lst[1]
    for d in dt:
        df = df[df.date == d]
        print("Processing date " + d + ' for PID: ', os.getpid())
        for index,row in df.iterrows():
            try:
                #print('Making directory ' + row['destination'])
                os.makedirs(row['destination'], exist_ok=True)
                shutil.copy(row['source'], row['destination'])
            except OSError as e:
                print('Failed to copy file ' + row['source'] + ' with error {0}'.format(e) )
            except:
                print("Unexpected error: ", sys.exc_info()[0])

output :

getFiles: Executed ...
getFiles: Creating empty list ...
getFiles: Concatenating files ...
Creating dataframe of files to be copied ...
Creating empty dataframe ...
Populating dataframe ...
Sorting data frame by date ...
Processing date 20180204 for PID:  35033 <- processed
Processing date 20180304 for PID:  35034 <- processed
Processing date 20180811 for PID:  35038 <- processed
Processing date 20180815 for PID:  35041 <- processed
Processing date 20180311 for PID:  35034 <- not processed
Processing date 20180724 for PID:  35034 <- not processed
Processing date 20180222 for PID:  35033 <- not processed
Processing date 20180303 for PID:  35033 <- not processed
Processing date 20180812 for PID:  35038 <- not processed
Processing date 20180813 for PID:  35038 <- not processed

Process finished with exit code 0

Without multiprocessing the script runs fine so I assume the issue is in the last 2 for loops in main, but I am not sure what I am doing wrong.

question from:https://stackoverflow.com/questions/66056578/why-is-my-multiprocessing-enabled-script-creating-only-4-out-of-10-folders

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This isn't exactly a multiprocessing problem, you just have a bug in your code.

On the first loop iteration in copyFilesAcross, you overwrite df, and throw away every line other than the one that matches the first date in dt. On the next (and all subsequent) iteration of for d in dt:, you try to find a different date which won't exist, and you then overwrite df with an empty dataframe. When you call for index,row in df.iterrows():, there are no rows, so the loop never executes at all.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...