Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

R: sample() command subject to a constraint

I am trying to randomly sample 7 numbers from 0 to 7 (with replacement), but subject to the constraint that the numbers chosen add up to 7. So for instance, the output 0 1 1 2 3 0 0 is okay, but the output 1 2 3 4 5 6 7 is not. Is there a way to use the sample command with added constraints?

I intend to use the replicate() function with the sample command as an argument, to return a list of N different vectors form the sample command. The way I am currently using the sample command (without any constraints), I need N to be very large in order to get as many possible vectors that sum to exactly 7 as possible. I figure there must be an easier way to do this!

Here is my code for that part:

x <- replicate(100000, sample(0:7, 7, replace=T))    

Ideally, I want 10,000 or 100,000 vectors in x to sum to 7, but would need an enormous N value to do this. Thanks for any help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:

library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]

From nrow(perms7), we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:

set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    0    0    0    2    5    0    0
# [2,]    1    3    0    1    2    0    0
# [3,]    1    4    1    1    0    0    0
# [4,]    1    0    0    3    0    3    0
# [5,]    0    2    0    0    0    5    0
# [6,]    1    1    2    0    0    2    1

An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7 took 0.3 seconds on my computer and building a 1 million-row my.perms took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7 instead of generating each vector separately.

Here's a distribution of counts of numbers in the sample:

#      0      1      2      3      4      5      6      7 
# 323347 188162 102812  51344  22811   8629   2472    423 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...