Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
605 views
in Technique[技术] by (71.8m points)

python - Index sort order of a multi-index dataframe does not respect categorical index order

A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'.

import pandas as pd
df = pd.DataFrame({'A':[1,1,2,2],
  'B':['One','Two','Three', 'Four'], 
  'X':[1,2,3,4]},
  index=range(4)).set_index(['A','B']).sort_index()
df

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct ordering.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1], 
       categories=['One','Two','Three', 'Four'], ordered=True), 
    level=1, inplace=True)

With this done inspecting the index shows that level 1 is indeed a categorical index. But sorting the index does not put the rows in the desired order.

df.sort_index()

         X
A B       
1 One    1
  Two    2
2 Four   4
  Three  3

Note: If the the dataframe has a simple index of 1 level only this works as expected.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I managed to get this by setting the index after the dataframe has been created - not sure if this is the best answer but it's an answer:

df = pd.DataFrame({'A':[1,1,2,2],
   'B':['One','Two','Three', 'Four'], 
   'X':[1,2,3,4]})
df = df.set_index(['A', pd.CategoricalIndex(df['B'], categories=['One','Two','Three', 'Four'], ordered=True)])
del df['B']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...