Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
962 views
in Technique[技术] by (71.8m points)

python - Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

  • id (string)
  • date (string)
  • location (string)
  • value (float)

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:Anacondaenvssandboxlibsite-packagespandascore
eshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...