python - Find all duplicate rows in a pandas dataframe

Question

Welcome To Ask or Share your Answers For Others

python - Find all duplicate rows in a pandas dataframe

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Find all duplicate rows in a pandas dataframe

I would like to be able to get the indices of all the instances of a duplicated row in a dataset without knowing the name and number of columns beforehand. So assume I have this:

I'd like to be able to get [1, 3, 4] and [2, 5]. Is there any way to achieve this? It sounds really simple, but since I don't know the columns beforehand I can't do something like df[col == x...].

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:48+0000

First filter all duplicated rows and then groupby with apply or convert index to_series:

df = df[df.col.duplicated(keep=False)]

a = df.groupby('col').apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

a = df.index.to_series().groupby(df.col).apply(list)
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

And if need nested lists:

L = df.groupby('col').apply(lambda x: list(x.index)).tolist()
print (L)
[[1, 3, 4], [2, 5]]

If need use only first column is possible selected by position with iloc:

a = df[df.iloc[:,0].duplicated(keep=False)]
      .groupby(df.iloc[:,0]).apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

Categories

python - Find all duplicate rows in a pandas dataframe

python - Find all duplicate rows in a pandas dataframe

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags