python 3.x - loop merge df and .txt and append matched output

Question

Welcome To Ask or Share your Answers For Others

python 3.x - loop merge df and .txt and append matched output

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python 3.x - loop merge df and .txt and append matched output

I am on Python3. I have a nameinternat_country dataframe with 13,527 companies (and 5 columns). I'd like to merge this company list to a Contact_info.txt 40+GB file (+190M companies and 29 columns) on the variable name_country.

My desired output would be a dataframe (I call it mergeall in the code below) with the 13,527 company list (from nameinternat_country, my left df), and the merged 29 columns from the matched cases with Contact_info.txt. The mergeall would have 13,527 rows and 34 columns (5 columns from nameinternat_country + 29 original columns from Contact_info.txt). Unmatched cases would show missing values.

The problem comes with Contact_info.txt having +40GB (I can not load it as dataframe for memory issues). So I first need to chunk it, and then do the merges chunk by chunk. Here the code I have (note: I subset the Contact_info.txt file to its first 5,000 rows just to be more efficient in my trials):

mergeall = pd.DataFrame()                     #create df to store merges in chunk below
ChunkSize = 1000                              #num of rows per chunk
for chunk in pd.read_csv('ORBIS financial/Contact info.txt', sep="", nrows=5000, chunksize=ChunkSize):
    chunk["name_country"]= chunk["NAME_INTERNAT"]+", "+chunk["Country"]           #create new (merging) column in txt files
    mergeall = pd.concat([mergeall, nameinternat_country.merge(chunk, how='left', on='name_country')])

The mergeall dataframe gives me 67,635 rows x 34 columns. Is the error in the pd.concat function?

Thanks a lot.

question from:https://stackoverflow.com/questions/65893159/loop-merge-df-and-txt-and-append-matched-output

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python 3.x - loop merge df and .txt and append matched output

python 3.x - loop merge df and .txt and append matched output

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags