Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
597 views
in Technique[技术] by (71.8m points)

algorithm - Generate sample of 1,000,000 random permutations

I am working with large number of integer permutations. The number of elements in each permutation is K. The element size is 1 byte. I need to generate N unique random permutations.
Constraints: K <= 144, N <= 1,000,000.

I came up with the following straightforward algorithm:

  1. Generate list of N random permutations. Store all permutations in RAM.
  2. Sort the list and delete all duplicates (if any). The number of duplicates will be relatively small.
  3. If there were any duplicates, add random permutations to the list until there are N permutations and return to step 2.

Is there a better way to do this? Especially, is there a way to not store all permutations in RAM (write them on disk while generating)?

Edit: In the end, the generated permutations need to be accessed sequentially (one-by-one, no need for random access). The RAM is more crucial factor (I would prefer to not store all permutations at once in RAM).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One possible solution is using bloom filters.

Store your permutations on disk (write them sequentially) and maintain a bloom filter in RAM.
Once you generate a permutation - check if it exists in the bloom filter, if the bloom filter says it is not written to disk yet- write it, bloom filters don't have false negatives.
If the bloom filter however says it is on the disk - it might be wrong..

if the bloom filter said "the permutation already exists", you can decide if you want to quit this candidate and go to the next one without checking if it is really already in the set, or you can search the disk to see if it is really there.
If you chose the later, you should consider maintaining a smart DS for the permutations such as a hash table or a B+ tree.

Bloom Filters are perfect match in here - they are designed to represent a set that is expansive to read, while giving 0 false negatives, which is the most important thing here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...