Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
216 views
in Technique[技术] by (71.8m points)

python - What is the pandas.Panel deprecation warning actually recommending?

I have a package that uses pandas Panels to generate MultiIndex pandas DataFrames. However, whenever I use pandas.Panel, I get the following DeprecationError:

DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method. Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a .to_xarray() method to help automate this conversion.

However, I can't understand what the first recommendation here is actually recommending in order to create MultiIndex DataFrames. If Panel is going to be removed, how am I going to be able to use Panel.to_frame?


To clarify: I am not asking what deprecation is, or how to convert my Panels to DataFrames. What I am asking is, if I am using pandas.Panel and then pandas.Panel.to_frame in a library to create MultiIndex DataFrames from 3D ndarrays, and Panels are going to be deprecated, then what is the best option for making those DataFrames without using the Panel API?

Eg, if I'm doing the following, with X as a ndarray with shape (N,J,K):

p = pd.Panel(X, items=item_names, major_axis=names0, minor_axis=names1)
df = p.to_frame()

this is clearly no longer a viable future-proof option for DataFrame construction, though it was the recommended method in this question.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Consider the following panel:

data = np.random.randint(1, 10, (5, 3, 2))
pnl = pd.Panel(
    data, 
    items=['item {}'.format(i) for i in range(1, 6)], 
    major_axis=[2015, 2016, 2017], 
    minor_axis=['US', 'UK']
)

If you convert this to a DataFrame, this becomes:

             item 1  item 2  item 3  item 4  item 5
major minor                                        
2015  US          9       6       3       2       5
      UK          8       3       7       7       9
2016  US          7       7       8       7       5
      UK          9       1       9       9       1
2017  US          1       8       1       3       1
      UK          6       8       8       1       6

So it takes the major and minor axes as the row MultiIndex, and items as columns. The shape has become (6, 5) which was originally (5, 3, 2). It is up to you where to use the MultiIndex but if you want the exact same shape, you can do the following:

data = data.reshape(5, 6).T
df = pd.DataFrame(
    data=data,
    index=pd.MultiIndex.from_product([[2015, 2016, 2017], ['US', 'UK']]),
    columns=['item {}'.format(i) for i in range(1, 6)]
)

which yields the same DataFrame (use the names parameter of pd.MultiIndex.from_product if you want to name your indices):

         item 1  item 2  item 3  item 4  item 5
2015 US       9       6       3       2       5
     UK       8       3       7       7       9
2016 US       7       7       8       7       5
     UK       9       1       9       9       1
2017 US       1       8       1       3       1
     UK       6       8       8       1       6

Now instead of pnl['item1 1'], you use df['item 1'] (optionally df['item 1'].unstack()); instead of pnl.xs(2015) you use df.xs(2015) and instead of pnl.xs('US', axis='minor'), you use df.xs('US', level=1).

As you see, this is just a matter of reshaping your initial 3D numpy array to 2D. You add the other (artificial) dimension with the help of MultiIndex.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...