Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
619 views
in Technique[技术] by (71.8m points)

r - Take the subsets of a data.frame with the same feature and select a single row from each subset

Suppose I have a matrix in R as follows:

ID Value
1 10
2 5
2 8
3 15
4 7
4 9
...

What I need is a random sample where every element is represented once and only once.

That means that ID 1 will be chosen, one of the two rows with ID 2, ID 3 will be chosen, one of the two rows with ID 4, etc...

There can be more than two duplicates.

I'm trying to figure out the most R-esque way to do this without subsetting and sampling the subsets?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

tapply across the rownames and grab a sample of 1 in each ID group:

dat[tapply(rownames(dat),dat$ID,FUN=sample,1),]

#  ID Value
#1  1    10
#3  2     8
#4  3    15
#6  4     9

If your data is truly a matrix and not a data.frame, you can work around this too, with:

dat[tapply(as.character(seq(nrow(dat))),dat$ID,FUN=sample,1),]

Don't be tempted to remove the as.character, as sample will give unintended results when there is only one value passed to it. E.g.

replicate(10, sample(4,1) )
#[1] 1 1 4 2 1 2 2 2 3 4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...