On my computer, there's a file /etc/dictionaries-common/words
containing a large collection of English words:
>>> with open("/etc/dictionaries-common/words") as f:
... words = [line.strip() for line in f]
...
>>> "python" in words
True
>>> "BDFL" in words
False
Let's create a dictionary storing the lengths of all those words:
>>> word_lengths = {w: len(w) for w in words}
>>> word_lengths["parrot"]
6
And, just for kicks, we'll shuffle our original word list:
>>> from random import shuffle
>>> shuffle(words)
>>> words[:5]
["Willie's", 'Araceli', 'accessed', 'engagingly', 'hobnobs']
Mmm, hobnobs. Anyway ... now that we've messed around a bit with words
, we've become a little bit paranoid (possibly for the same reason that we're craving hobnobs), and we want to check that all the words in our word_lengths
dictionary are still in words
after we mixed it all up:
>>> all(w in words for w in word_lengths)
True
Well, we got there, but on my machine that took over three minutes - enough time to eat a couple more delicious biscuits, at least. Thinking about it, it's obvious why: we've got ...
>>> len(words)
99171
... nearly a hundred thousand words to check, and for each one in the dictionary, Python has to search through our mixed-up list of words until it finds a match. It won't always have to check the whole list, but on average that's going to be fifty thousand words (or half the list) each time, for a total of 50,000 × 100,000 = 5,000,000,000 tests. Five billion is a lot, even in this miraculous age of technology.
Just to be absolutely sure (I'm not normally so paranoid; normally I just get sleepy), let's check the other way around, and make sure that everything in words
is still in word_lengths
:
>>> all(w in word_lengths for w in words)
True
Hey, what? This time it was, like, a tenth of a second! What gives? You're freaking me out, man ... and hey, where are my biscuits? I had them just now, I'm sure of it.
Unlike a list, which can be in any old order (so making sure that some item is in there means checking each item in turn until we find it), a dictionary is a bit more efficient. Probably less fun at parties, but hey, leave it in charge of the music and all is copacetic, y'know?
The secret of dictionaries' ruthless efficiency is that for each item, the dictionary calculates a hash (just an integer, really) of the key based on its content, and uses that to store the item at a specific location in memory. Then, when you go looking for the item, it calculates the hash of the key's content again, says to itself "okay, "python"
, that hashes to 7036520087640895475
... yeah, I know where I must have put that, then", and goes straight to the right memory location to find it. So this time, it only had to do a hundred thousand checks rather than five billion.
It's kinda like having all your CDs neatly alphabetised on shelves, rather than stacked randomly out of their cases on top of your speakers. Dictionaries know where it's at, I'm telling you.
But there's a price to pay for dictionaries' ability to keep it together. Remember when I said that the dictionary calculates a hash based on the item's content? Well, what happens if that content changes? For immutable objects that's not a problem - their content can't change - but mutable objects, by definition, can change their contents, and when they do, their hash (if they even have one) will change too. Which is cool, obviously, not everyone wants to be put in a box, I get that, but if the hash has changed, there's no way for the dictionary to work out where it put the thing.
It's as though Joy Division changed their name to New Order, and now you've got no idea where you put that 12" remix of Blue Monday. It's just not gonna work.
So, dictionaries have a rule: if you want to be a key, don't go changing.