Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
413 views
in Technique[技术] by (71.8m points)

python 3.x - loop merge df and .txt and append matched output

I am on Python3. I have a nameinternat_country dataframe with 13,527 companies (and 5 columns). I'd like to merge this company list to a Contact_info.txt 40+GB file (+190M companies and 29 columns) on the variable name_country.

My desired output would be a dataframe (I call it mergeall in the code below) with the 13,527 company list (from nameinternat_country, my left df), and the merged 29 columns from the matched cases with Contact_info.txt. The mergeall would have 13,527 rows and 34 columns (5 columns from nameinternat_country + 29 original columns from Contact_info.txt). Unmatched cases would show missing values.

The problem comes with Contact_info.txt having +40GB (I can not load it as dataframe for memory issues). So I first need to chunk it, and then do the merges chunk by chunk. Here the code I have (note: I subset the Contact_info.txt file to its first 5,000 rows just to be more efficient in my trials):

mergeall = pd.DataFrame()                     #create df to store merges in chunk below
ChunkSize = 1000                              #num of rows per chunk
for chunk in pd.read_csv('ORBIS financial/Contact info.txt', sep="", nrows=5000, chunksize=ChunkSize):
    chunk["name_country"]= chunk["NAME_INTERNAT"]+", "+chunk["Country"]           #create new (merging) column in txt files
    mergeall = pd.concat([mergeall, nameinternat_country.merge(chunk, how='left', on='name_country')])

The mergeall dataframe gives me 67,635 rows x 34 columns. Is the error in the pd.concat function?

Thanks a lot.

question from:https://stackoverflow.com/questions/65893159/loop-merge-df-and-txt-and-append-matched-output

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...