python - how do I remove rows with duplicate values of columns in pandas data frame?

Question

Welcome To Ask or Share your Answers For Others

python - how do I remove rows with duplicate values of columns in pandas data frame?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - how do I remove rows with duplicate values of columns in pandas data frame?

I have a pandas data frame which looks like this.

  Column1  Column2 Column3
0     cat        1       C
1     dog        1       A
2     cat        1       B

I want to identify that cat and bat are same values which have been repeated and hence want to remove one record and preserve only the first record. The resulting data frame should only have.

  Column1  Column2 Column3
0     cat        1       C
1     dog        1       A

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:30:27+0000

Using drop_duplicates with subset with list of columns to check for duplicates on and keep='first' to keep first of duplicates.

If dataframe is:

df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
                   'Column2': ["'bat'", "'flower'", "'bat'"],
                   'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)

Result:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'
2   'cat'     'bat'   'lmn'

Then:

result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)

Result:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'

Categories

python - how do I remove rows with duplicate values of columns in pandas data frame?

python - how do I remove rows with duplicate values of columns in pandas data frame?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags