Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
392 views
in Technique[技术] by (71.8m points)

python - Grouping by multiple columns to find duplicate rows pandas

I have a df

id    val1    val2
 1     1.1      2.2
 1     1.1      2.2
 2     2.1      5.5
 3     8.8      6.2
 4     1.1      2.2
 5     8.8      6.2

I want to group by val1 and val2 and get similar dataframe only with rows which has multiple occurance of same val1 and val2 combination.

Final df:

id    val1    val2
 1     1.1      2.2
 4     1.1      2.2
 3     8.8      6.2
 5     8.8      6.2
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need duplicated with parameter subset for specify columns for check with keep=False for all duplicates for mask and filter by boolean indexing:

df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
   id  val1  val2
0   1   1.1   2.2
1   1   1.1   2.2
3   3   8.8   6.2
4   4   1.1   2.2
5   5   8.8   6.2

Detail:

print (df.duplicated(subset=['val1','val2'], keep=False))
0     True
1     True
2    False
3     True
4     True
5     True
dtype: bool

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...