Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

python - Fastest way to sort each row in a pandas dataframe

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

So something like this:

A   B   C   D
3   4   8   1
9   2   7   2

Needs to become:

A   B   C   D
8   4   3   1
9   7   2   2

Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think I would do this in numpy:

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

I had thought this might work, but it sorts the columns:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...