duplicates - Keep first row by multiple columns in an R data.table

Question

Welcome To Ask or Share your Answers For Others

duplicates - Keep first row by multiple columns in an R data.table

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

duplicates - Keep first row by multiple columns in an R data.table

I'd like to get the first row only from a data.table, grouped by multiple columns.

This is straightforward with a single column, e.g.:

(dt <- data.table(x = c(1, 1, 1, 2),
                  y = c(1, 1, 2, 2),
                  z = c(1, 2, 1, 2)))
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(x)] # Remove rows 2-3
#     x y z
# |1: 1 1 1
# |2: 2 2 2

But none of these approaches work when trying to remove based on two columns; i.e. in this case removing only row 2:

dt[!duplicated(x, y)] # Keeps only original data set
#     x y z
# |1: 1 1 1
# |2: 1 1 2
# |3: 1 2 1
# |4: 2 2 2
dt[!duplicated(list(x, y))] # Same as above
dt[!duplicated(c("x", "y"))] # Same as above
dt[!duplicated(list("x", "y"))] # Same as above
dt[!duplicated(c(x, y))] # Only removes duplicates from first column
#     x y z
# |1: 1 1 1
# |2: 2 2 2

Except for this, which only works in certain cases:

dt[!duplicated(paste0(x, y))]
#     x y z
# |1: 1 1 1
# |2: 1 2 1
# |3: 2 2 2

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:00:01+0000

data.table provides S3 methods for unique, duplicated and anyDuplicated

unique(dt, by = c('x','y'))

will give you what you want.

Categories

duplicates - Keep first row by multiple columns in an R data.table

duplicates - Keep first row by multiple columns in an R data.table

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags