Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
225 views
in Technique[技术] by (71.8m points)

python - Pandas Design Considerations for MultiIndexed Dataframes

The purpose of this question is to further explore MultiIndex dataframes and to ask questions of the best approach for various tasks.

Create the DataFrame

import pandas as pd

df = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
               'portfolio' : ['A','B','C','D','E'], 
               'reporting_ccy' : ['GBP','GBP','GBP','GBP','GBP'],
               'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
               'amount' : [100,200,300,400,500],
               'injection' : [1,2,3,4,5],
               'to_usd' : [1.3167,1.3167,1.3167,1.3167,1.3167],
               'to_ccy' : [0.009564,1,1,1.1093,1.1093],
               'm5' : [2,4,6,8,10],
               'm6' : [1,3,5,7,9]}); 

Pivot the DataFrame

df_pivot = df.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)

Rename the columns

df_pivot.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']

This yields a pivoted representation of the data such that:

  1. a portfolio may have 1 or many measures
  2. shows the portfolio default currency
  3. shows the portfolio reporting currency
  4. a measure may have 1 or many reporting currencies.

I terms of 4. what is the best approach for implementation given that we have the xRates for the currencies?

Such that we create a dataframe such as that derived here:

Create DataFrame

df1 = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
           'portfolio' : ['A','B','C','D','E'], 
           'reporting_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'amount' : [13767.2522, 263.34, 395.01, 474.785901, 593.4823763],
           'injection' : [1,2,3,4,5],
           'to_usd' : [0.009564, 1, 1, 1.1093, 1.1093],
           'to_ccy' : [1.3167, 1.3167, 1.3167, 1.3167, 1.3167],
           'm5' : [2,4,6,8,10],
           'm6' : [1,3,5,7,9]}); 

Concatenate & Pivot the DataFrames

df_concat = pd.concat([df,df1])
df_pivot1 = df_concat.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
df_pivot1.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']

This now shows 1 measure having many currencies.

df_pivot1.xs(('amount', 'A'), level=('measures','portfolio'), drop_level=False, axis=1)

Question

Is there a better way, such as adding data directly to a multiIndexed dataframe at level 3 df_pivot1.columns.get_level_values(3).unique()?

I would like to be able to iterate through each level and add new measures either derived from other measures using df.assign() or other methods.

The use case here is to add other currencies to the measures where applicable. The concatenation and re-pivot as above does not seem optimal.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can append df1 row by row onto df_pivot instead of rebuilding the pivot with both frames concat together.

Appending to the end of the DataFrame would be less memory expensive than concatenating and rebuilding the pivots from scratch every time new data is received.

import pandas as pd

df = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
               'portfolio' : ['A','B','C','D','E'], 
               'reporting_ccy' : ['GBP','GBP','GBP','GBP','GBP'],
               'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
               'amount' : [100,200,300,400,500],
               'injection' : [1,2,3,4,5],
               'to_usd' : [1.3167,1.3167,1.3167,1.3167,1.3167],
               'to_ccy' : [0.009564,1,1,1.1093,1.1093],
               'm5' : [2,4,6,8,10],
               'm6' : [1,3,5,7,9]}); 

# %%
df_pivot = df.pivot_table(index='index_date',columns=['portfolio','portfolio_ccy','reporting_ccy']).swaplevel(0, 1, axis=1).sortlevel(axis=1)
df1 = pd.DataFrame({'index_date' : ['12/07/2016','12/07/2016','12/07/2016','12/07/2016','12/07/2016'], 
           'portfolio' : ['A','B','C','D','E'], 
           'reporting_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'portfolio_ccy' : ['JPY','USD','USD','EUR','EUR'],
           'amount' : [13767.2522, 263.34, 395.01, 474.785901, 593.4823763],
           'injection' : [1,2,3,4,5],
           'to_usd' : [0.009564, 1, 1, 1.1093, 1.1093],
           'to_ccy' : [1.3167, 1.3167, 1.3167, 1.3167, 1.3167],
           'm5' : [2,4,6,8,10],
           'm6' : [1,3,5,7,9]}); 

df_pivot.columns.names = ['portfolio','measures', 'portfolio_ccy', 'reporting_ccy']
# instead of joining the 2 df's add df1 to df_pivot 1 row at a time. 
for i in range(len(df1)):
    row = df1.iloc[i]
    for measure in 'amount injection m5 m6 to_ccy to_usd'.split():
        df_pivot.ix[row.index_date, (row.portfolio,measure,row.portfolio_ccy, row.reporting_ccy)] = row[measure]

#%% check the end result
print(df_pivot.xs(('amount', 'A'), 
      level=('measures','portfolio'), drop_level=False, axis=1))   

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...