r - from data table, randomly select one row per group

Question

Welcome To Ask or Share your Answers For Others

r - from data table, randomly select one row per group

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - from data table, randomly select one row per group

I'm looking for an efficient way to select rows from a data table such that I have one representative row for each unique value in a particular column.

Let me propose a simple example:

require(data.table)

y = c('a','b','c','d','e','f','g','h')
x = sample(2:10,8,replace = TRUE)
z = rep(y,x)
dt = as.data.table( z )

my objective is to subset data table dt by sampling one row for each letter a-h in column z.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:34:35+0000

OP provided only a single column in the example. Assuming that there are multiple columns in the original dataset, we group by 'z', sample 1 row from the sequence of rows per group, get the row index (.I), extract the column with the row index ($V1) and use that to subset the rows of 'dt'.

dt[dt[ , .I[sample(.N,1)] , by = z]$V1]

Categories

r - from data table, randomly select one row per group

r - from data table, randomly select one row per group

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags