python - How to "select distinct" across multiple data frame columns in pandas?

Question

Welcome To Ask or Share your Answers For Others

python - How to "select distinct" across multiple data frame columns in pandas?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to "select distinct" across multiple data frame columns in pandas?

I'm looking for a way to do the equivalent to the SQL

SELECT DISTINCT col1, col2 FROM dataframe_table

The pandas sql comparison doesn't have anything about distinct.

.unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.

Am I missing something obvious, or is there no way to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:56:23+0000

You can use the drop_duplicates method to get the unique rows in a DataFrame:

In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]})

In [30]: df
Out[30]:
   a  b
0  1  3
1  2  4
2  1  3
3  2  5

In [32]: df.drop_duplicates()
Out[32]:
   a  b
0  1  3
1  2  4
3  2  5

You can also provide the subset keyword argument if you only want to use certain columns to determine uniqueness. See the docstring.

Categories

python - How to "select distinct" across multiple data frame columns in pandas?

python - How to "select distinct" across multiple data frame columns in pandas?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags