Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.6k views
in Technique[技术] by (71.8m points)

python - pandas reshaping dataframe with different length rows

I have a the following pandas data frame with the index on the left:

      0   1   2   3   4   5   6   7   8   9   10   11   12   13   14
A17   a   b   1  AUG) NaN NaN NaN NaN NaN NaN NaN  NaN  NaN  NaN  NaN  
nn6   c   d   2  POS) e   f   2   Hi)
AZV   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN  NaN  NaN  NaN
JFK   a   b   4   UUI) c   v   8   Yo)  t   f   9   po)

I'm looking to re-shape it to:

     0    1    2    3
A17  a    b    1   AUG)
nn6  c    d    2   POS)
nn6  e    f    2   Hi)
AZV  NaN  NaN  NaN NaN
JFK  a    b    4   UUI)
JFK  c    v    8   Yo)
JFK  t    f    9   po

I have tried reshape() and used itertools to iterate over the columns but still can't seem to get it.

Basically, each time a ) is encountered then break to a new line. The real table has over 150 columns.

thanks


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Another option that doesn't require iterating over rows (which can be very slow if there are many) is to do the following

[ins] In [1]: df
Out[1]: 
     0    1    2     3    4    5    6    7
A17  a    b    1  AUG)  NaN  NaN  NaN  NaN
nn6  c    d    2  POS)    e    f    2  HI)
AVZ     NaN  NaN   NaN  NaN  NaN  NaN  NaN

[ins] In [2]: joined = df.apply(lambda x: ' '.join([str(xi) for xi in x]), axis=1)
[ins] In [4]: split = joined.str.split(')', expand=True).reset_index(drop=False).melt(id_vars='index')

[ins] In [6]: split.drop('variable', axis=1, inplace=True)

[ins] In [7]: split
Out[7]: 
  index                        value
0   A17                    a b 1 AUG
1   nn6                    c d 2 POS
2   AVZ  nan nan nan nan nan nan nan
3   A17              nan nan nan nan
4   nn6                     e f 2 HI
5   AVZ                         None
6   A17                         None
7   nn6                             
8   AVZ                         None

[ins] In [8]: sel = split['value'].str.strip().str.len() > 0

[ins] In [9]: split = split.loc[sel, :]

[ins] In [9]: split
Out[9]: 
  index                        value
0   A17                    a b 1 AUG
1   nn6                    c d 2 POS
2   AVZ  nan nan nan nan nan nan nan
3   A17              nan nan nan nan
4   nn6                     e f 2 HI

[ins] In [10]: out = split['value'].str.strip().str.split(' ', expand=True)

[ins] In [11]: out.index = split['index']

[ins] In [12]: out
Out[12]: 
         0    1    2    3     4     5     6
index                                      
A17      a    b    1  AUG  None  None  None
nn6      c    d    2  POS  None  None  None
AVZ    nan  nan  nan  nan   nan   nan   nan
A17    nan  nan  nan  nan  None  None  None
nn6      e    f    2   HI  None  None  None

and then it's a matter of dropping the 4th to 6th column which is simple. I added some of the output so that you can see what's happening in each step.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...