Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
393 views
in Technique[技术] by (71.8m points)

python - Set iteration order varies from run to run

Why does the iteration order of a Python set (with the same contents) vary from run to run, and what are my options for making it consistent from run to run?

I understand that the iteration order for a Python set is arbitrary. If I put 'a', 'b', and 'c' into a set and then iterate them, they may come back out in any order.

What I've observed is that the order remains the same within a run of the program. That is, if my program iterates the same set twice in a row, I get the same order both times. However, if I run the program twice in a row, the order changes from run to run.

Unfortunately, this breaks one of my automated tests, which simply compares the output from two runs of my program. I don't care about the actual order, but I would like it to be consistent from run to run.

The best solution I've come up with is:

  1. Copy the set to a list.
  2. Apply an arbitrary sort to the list.
  3. Iterate the list instead of the set.

Is there a simpler solution?

Note: I've found similar questions on StackOverlow, but none that address this specific issue of getting the same results from run to run.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The reason the set iteration order changes from run-to-run appears to be because Python uses hash seed randomization by default. (See command option -R.) Thus set iteration is not only arbitrary (because of hashing), but also non-deterministic (because of the random seed).

You can override the random seed with a fixed value by setting the environment variable PYTHONHASHSEED for the interpreter. Using the same seed from run to run means set iteration is still arbitrary, but now it is deterministic, which was the desired property.

Hash seed randomization is a security measure to make it difficult for an adversary to feed inputs that will cause pathological behavior (e.g., by creating numerous hash collisions). For unit testing, this is not a concern, so it's reasonable to override the hash seed while running tests.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...