I have diferent dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date')
, to do it with three dataframes, I use df1.merge(df2.merge(df3, on='date'), on='date')
, however it becomes really complex and unreadable to do it with multiple dataframes.
All dataframes have one column in common -date
, but they don't have the same number of rows nor columns and I only need those rows in which each date is common to every dataframe.
So, I'm trying to write a recursion function that returns a dataframe with all data but it didn't work. How should I merge multiple dataframes then?
I tried diferent ways and got errors like out of range
, keyerror 0/1/2/3
and can not merge DataFrame with instance of type <class 'NoneType'>
.
This is the script I wrote:
dfs = [df1, df2, df3] # list of dataframes
def mergefiles(dfs, countfiles, i=0):
if i == (countfiles - 2): # it gets to the second to last and merges it with the last
return
dfm = dfs[i].merge(mergefiles(dfs[i+1], countfiles, i=i+1), on='date')
return dfm
print(mergefiles(dfs, len(dfs)))
An example:
df_1:
May 19, 2017;1,200.00;0.1%
May 18, 2017;1,100.00;0.1%
May 17, 2017;1,000.00;0.1%
May 15, 2017;1,901.00;0.1%
df_2:
May 20, 2017;2,200.00;1000000;0.2%
May 18, 2017;2,100.00;1590000;0.2%
May 16, 2017;2,000.00;1230000;0.2%
May 15, 2017;2,902.00;1000000;0.2%
df_3:
May 21, 2017;3,200.00;2000000;0.3%
May 17, 2017;3,100.00;2590000;0.3%
May 16, 2017;3,000.00;2230000;0.3%
May 15, 2017;3,903.00;2000000;0.3%
Expected merge result:
May 15, 2017; 1,901.00;0.1%; 2,902.00;1000000;0.2%; 3,903.00;2000000;0.3%
Question&Answers:
os