Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
507 views
in Technique[技术] by (71.8m points)

python - Set.pop() isn't random?

From the python docs, "set.pop() remove and return an arbitrary element from s". While generating some random data to test a program, I noticed strange behavior of this pop() function. Here is my code (python 2.7.3):

testCases = 10
numberRange = 500

poppedValues = []
greaterPercentages = []

for i in range (testCases):
    s = Set()

    """ inserting 100 random values in the set, in the range [0, numberRange) """
    for j in range (100):
        s.add(random.randrange(numberRange)) 

    poppedValue = s.pop()
    greaterCount = 0

    """ counting how many numbers in the set are smaller then the popped value """
    for number in s:
        if poppedValue > number:
            greaterCount += 1

    poppedValues.append(poppedValue)
    greaterPercentages.append(float(greaterCount) / len(s) * 100)

for poppedValue in poppedValues:
    print poppedValue, '',

print

for percentage in greaterPercentages:
    print "{:2.2f}".format(percentage), '',

What I'm doing here is,

  1. Inserting some random values in the set s where each element is in the range [0, numberRange)
  2. Pop an element from the set (according to the docs, it should be a random one)
  3. Counting how many elements in the set are smaller then the popped value

I expected that the popped value should be a random one and about 50% of the numbers in the set will be greater then the popped value. But seems that pop() almost always returns the lowest number in the set. Here are the result for numberRange = 500. First row denotes the values of the popped element. Second row is the percentage of elements which are smaller then the popped value.

9   0   3   1   409     0   1   2   4   0   
0 % 0 % 0 % 0 % 87 %    0 % 0 % 0 % 0 % 0 %

I've conducted this test with different values of numberRange. It seems that for lower values of the set elements, pop() almost always returns the lowest element. But for higher values it returns a random element. For numberRange = 1000, the result is:

518     3586    3594    4103    2560    3087    4095    3079    3076    1622    
7 %     72 %    73 %    84 %    54 %    51 %    79 %    63 %    67 %    32 %

which I think is pretty random. Why this strange behavior? Am I doing something wrong?

EDIT: Thanks for everyone's answer and comment, seems that by "arbitrarily", it isn't guaranteed that it will be "random".

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's an implementation detail - set is implemented as a HashMap (similar to dict but without a slot for a value), set.pop removes the first entry in the HashMap, and an ints hash value is the same int.

Combined, this means that your set, which is ordered by the hash values, is actually ordered by the entries modulo hashtable size as well; this should be close to natural ordering in your case as you are only inserting numbers from a small range - if you take random numbers from randrange(10**10) instead of randrange(500) you should see a different behaviour. Also, depending on your insertion order, you can get some values out of their original hashing order due to hash collisions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...