Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
397 views
in Technique[技术] by (71.8m points)

python - pandas: slice a MultiIndex by range of secondary index

I have a series with a MultiIndex like this:

import numpy as np
import pandas as pd

buckets = np.repeat(['a','b','c'], [3,5,1])
sequence = [0,1,5,0,1,2,4,50,0]

s = pd.Series(
    np.random.randn(len(sequence)), 
    index=pd.MultiIndex.from_tuples(zip(buckets, sequence))
)

# In [6]: s
# Out[6]: 
# a  0    -1.106047
#    1     1.665214
#    5     0.279190
# b  0     0.326364
#    1     0.900439
#    2    -0.653940
#    4     0.082270
#    50   -0.255482
# c  0    -0.091730

I'd like to get the s['b'] values where the second index ('sequence') is between 2 and 10.

Slicing on the first index works fine:

s['a':'b']
# Out[109]: 
# bucket  value
# a       0        1.828176
#         1        0.160496
#         5        0.401985
# b       0       -1.514268
#         1       -0.973915
#         2        1.285553
#         4       -0.194625
#         5       -0.144112

But not on the second, at least by what seems to be the two most obvious ways:

1) This returns elements 1 through 4, with nothing to do with the index values

s['b'][1:10]

# In [61]: s['b'][1:10]
# Out[61]: 
# 1     0.900439
# 2    -0.653940
# 4     0.082270
# 50   -0.255482

However, if I reverse the index and the first index is integer and the second index is a string, it works:

In [26]: s
Out[26]: 
0   a   -0.126299
1   a    1.810928
5   a    0.571873
0   b   -0.116108
1   b   -0.712184
2   b   -1.771264
4   b    0.148961
50  b    0.089683
0   c   -0.582578

In [25]: s[0]['a':'b']
Out[25]: 
a   -0.126299
b   -0.116108
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:

In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b  2   -0.65394
   4    0.08227
dtype: float64

Indeed, you can pass a slice for each level:

In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a  5    0.27919
b  2   -0.65394
   4    0.08227
dtype: float64

Note: the slice is inclusive.


Old answer:

You can also do this using:

s.ix[1:10, "b"]

(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)

This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".

s["b"].iloc[1:10]

That said, I kinda disagree with the docs that ix is:

most robust and consistent way

it's not, the most consistent way is to describe what you're doing:

  • use loc for labels
  • use iloc for position
  • use ix for both (if you really have to)

Remember the zen of python:

explicit is better than implicit


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...