Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
560 views
in Technique[技术] by (71.8m points)

algorithm - Randomly choosing from a list with weighted probabilities

I have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:

  • If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).

  • If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.

Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:

  • Add up all the frequencies in the array. Store it in SUM
  • Choosing up a random value from 0 to SUM. Store it in RAN
  • [While] RAN > 0, Starting from the first, visit each cell in the array (in order), and subtract the value of that cell from RAN
  • The last visited cell is the chosen one

So, is there a better thing to do than this? Am I missing something?

I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.

I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The idea:

  • Iterate through all the elements and set the value of each element as the cumulative frequency thus far.
  • Generate a random number between 1 and the sum of all frequencies
  • Do a binary search on the values for this number (finding the first value greater than or equal to the number).

Example:

Element    A B C D
Frequency  1 4 3 2
Cumulative 1 5 8 10

Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:

Number   Element returned
1        A
2        B
3        B
4        B
5        B
6        C
7        C
8        C
9        D
10       D

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...