Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
230 views
in Technique[技术] by (71.8m points)

python - How do I wrangle a number of df so that they all contain identically labelled indices & columns?

I'm a relative novice to pandas but use it to plot and compare trends in industrial and economic data across countries and time. The df are organised like this:

#create sample df

df1 = pd.DataFrame(columns=['2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010',
       '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018'],index = ['United Arab Emirates', 'Argentina', 'Australia', 'Austria', 'Bulgaria',
       'Brazil', 'Canada'])
df2 = pd.DataFrame(columns=['2004', '2005', '2006', '2007', '2008', '2009', '2010',
       '2011', '2012', '2013', '2014', '2015', '2016'],index = ['Argentina', 'Australia', 'Austria', 'Bulgaria',
       'Brazil', 'Canada', 'Switzerland', 'Chile', 'Colombia'])         
df3 = pd.DataFrame(columns=['2005', '2006', '2007', '2008', '2009', '2010',
       '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018'],index = ['Argentina', 'Australia', 'Austria', 'Bulgaria',
       'Brazil', 'Canada'])         

This data comes from different sources so does not always contain the same list of countries and years. In order to scatter plot them I need to wrangle the df so that they are al the same shape with identical rows & columns / lists of countries and years. I am doing this as following:

Concat the df joining on inner, creating lists of countries and years that are common to all the df:

#create lists of countries and years common to all df

dfList = [df1, df2, df3]

merged = pd.concat(dfList, axis = 1, join='inner')
countryList = merged.index

merged = pd.concat(dfList, axis=0, join='inner')
yearList = merged.columns

However I am having problems writing a function that loops through the df and removes the columns and rows (years & countries) that are not contained in yearList and countryList. The following function seems to run okay but does not change the columns / rows of the df. I'm pretty sure this is down to my misunderstanding of how variables work within loops, but haven't been able to find anything on this as it applies to complete df.

Can anyone point out why this loop isn't working or suggest a more elegant / efficient way of wrangling a group of df so that they all contain identically labelled indices & columns? Many thanks in advance.

#loop through all df removing all rows / cols that are not in countryList & yearList 
 
def countryyear(x):
    for x in dfList:
        x = x[x.index.isin(countryList)]
        x = x.loc[:,x.columns.isin(yearList)]
        #return x

countryyear(dfList)
question from:https://stackoverflow.com/questions/65881577/how-do-i-wrangle-a-number-of-df-so-that-they-all-contain-identically-labelled-in

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In your countryyear function your loop is working, but you have a return statement in it. When the loop reaches that return it ends the loop and outputs the first dataframe. I would iterate over dataframes outside the function, then return the clean dataframes to a list. At the moment you also aren't using the local variable x that you assign in the dataframe, you just ignore it by using x in the for loop.

def countryyear(df):
    df = df[fd.index.isin(countryList)]
    df = df.loc[:,df.columns.isin(yearList)]
    return df

cleanDFs = [countryyear(df) for df in dfList]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...