r - Filtering a dataframe showing only duplicates

Question

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

I need some help to filter a dataframe.

The df has several columns and I want to split it into two dataframes:

1- One including only the rows in which the first column is a duplicate (including all of the replicas).

2- The rest of the rows, which are not duplicates.

Here is an example: This would be the original.

          V1  V2 
    [1,] "A" "1"
    [2,] "B" "1"
    [3,] "A" "1"
    [4,] "C" "2"
    [5,] "D" "3"
    [6,] "D" "4"

I want to turn into this:

         V1  V2 
   [1,] "A" "1"
   [2,] "A" "1"
   [3,] "D" "3"
   [4,] "D" "4"

And this:

        V1  V2 
  [1,] "B" "1"
  [2,] "C" "2"

Is there a way to do that? I have tried exporting to Excel, but the dataset was too large to make that viable.

Thank you

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:07:08+0000

Considering df as your input, you can use dplyr and try:

df %>% group_by(V1) %>% filter(n() > 1)

for the duplicates

and

df %>% group_by(V1) %>% filter(n() == 1)

for the unique entries.