r - unique.data.table select last row in place of the first

Question

Welcome To Ask or Share your Answers For Others

r - unique.data.table select last row in place of the first

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - unique.data.table select last row in place of the first

calling unique on a keyed data.table you'll have unique lines per each group. In case of duplicated lines the first will be taken. When I need the take the last instead ( in general the last temporal transaction) I use .SD[.N]

library(data.table)
library(microbenchmark)

dt <- data.table(id=sample(letters, 10000, T), var=rnorm(10000), key="id")

microbenchmark(unique(dt), dt[, .SD[.N], by=id])
Unit: microseconds
                   expr      min        lq    median       uq        max neval
             unique(dt)  570.882  586.1155  595.8975  608.406   3209.122   100
 dt[, .SD[.N], by = id] 6532.739 6637.7745 6694.3820 6776.968 208264.433   100

do you know a faster way to do the same?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:37+0000

Create a data.table that contains the unique combinations of the key variables then join using mult = 'last'

Using .SD is convenient, but slow. You could use .I instead if you wished.

dtu <- unique(dt)[,key(dt), with = FALSE]
dt[dtu, mult = 'last']

Or

 dt[ dt[,  .I[.N], by = key(dt)]$V1]

Categories

r - unique.data.table select last row in place of the first

r - unique.data.table select last row in place of the first

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags