Using the following code, I'm trying to read, process and concatenate a bunch of CSV files in parallel.
def read_process(file):
df = pd.read_csv(file)
process_df(df)
return df
if __name__ == '__main__':
path = '/path/to/directory/'
files = [os.path.join(path, file) for file in os.listdir(path)]
with Pool() as pool:
dfs = pool.map(read_process, files)
df = pd.concat(dfs, ignore_index=True)
However, the memory consumption exponentially increases for large datasets. How can I limit the amount of used memory by Pool
?
question from:
https://stackoverflow.com/questions/65867089/read-and-process-many-files-in-parallel 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…