Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
296 views
in Technique[技术] by (71.8m points)

python - Selecting columns from pandas MultiIndex

I have DataFrame with MultiIndex columns that looks like this:

# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

sample data

What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'], not a range) from the second level?

Currently I am doing it like this:

import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

expected result

It doesn't feel like a good solution, however, because I have to bust out itertools, build another MultiIndex by hand and then reindex (and my actual code is even messier, since the column lists aren't so simple to fetch). I am pretty sure there has to be some ix or xs way of doing this, but everything I tried resulted in errors.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The most straightforward way is with .loc:

>>> data.loc[:, (['one', 'two'], ['a', 'b'])]


   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

Remember that [] and () have special meaning when dealing with a MultiIndex object:

(...) a tuple is interpreted as one multi-level key

(...) a list is used to specify several keys [on the same level]

(...) a tuple of lists refer to several values within a level

When we write (['one', 'two'], ['a', 'b']), the first list inside the tuple specifies all the values we want from the 1st level of the MultiIndex. The second list inside the tuple specifies all the values we want from the 2nd level of the MultiIndex.

Edit 1: Another possibility is to use slice(None) to specify that we want anything from the first level (works similarly to slicing with : in lists). And then specify which columns from the second level we want.

>>> data.loc[:, (slice(None), ["a", "b"])]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

If the syntax slice(None) does appeal to you, then another possibility is to use pd.IndexSlice, which helps slicing frames with more elaborate indices.

>>> data.loc[:, pd.IndexSlice[:, ["a", "b"]]]

   one       two     
     a    b    a    b
0  0.4 -0.6 -0.7  0.9
1  0.1  0.4  0.5 -0.3
2  0.7 -1.6  0.7 -0.8
3 -0.9  2.6  1.9  0.6

When using pd.IndexSlice, we can use : as usual to slice the frame.

Source: MultiIndex / Advanced Indexing, How to use slice(None)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...