scala - How to filter duplicate records having multiple key in Spark Dataframe?

Question

Welcome To Ask or Share your Answers For Others

scala - How to filter duplicate records having multiple key in Spark Dataframe?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - How to filter duplicate records having multiple key in Spark Dataframe?

I have two dataframes. I want to delete some records in Data Frame-A based on some common column values in Data Frame-B.

For Example: Data Frame-A:

Data Frame-B:

Keys: A,B,C columns

Desired Output:

A B C D
3 4 5 7
4 7 9 6

Any solution for this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:31:50+0000

You are looking for left anti-join:

df_a.join(df_b, Seq("A","B","C"), "leftanti").show()
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  3|  4|  5|  7|
|  4|  7|  9|  6|
+---+---+---+---+

Categories

scala - How to filter duplicate records having multiple key in Spark Dataframe?

scala - How to filter duplicate records having multiple key in Spark Dataframe?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags