The performance differences is mainly caused by OrderedDict
.
OrderedDict
uses dict
's get
and __getitem__
, but redefined its own __iter__
and iteritems
.
def __iter__(self):
'od.__iter__() iter(od)'
# Traverse the linked list in order.
root = self.__root
curr = root[1] # start at the first node
while curr is not root:
yield curr[2] # yield the curr[KEY]
curr = curr[1] # move to next node
def iteritems(self):
'od.iteritems -> an iterator over the (key, value) pairs in od'
for k in self:
yield (k, self[k])
Look at what we found: self[k]
.
Your second solution did not help us avoid the hash-bin-key calculation.
While the iterator generated by dict
, more precisely, items.iteritems().next()
if items
is a dict
, does not make that calculation.
Moreover, iteritems
is also more expensive.
from timeit import Timer
N = 1000
d = {i:i for i in range(10000)}
def f1():
for k in d: pass
def f2():
for k in d.iterkeys(): pass
def f3():
for v in d.itervalues(): pass
def f4():
for t in d.iteritems(): pass
print(Timer(stmt='f1()', setup='from __main__ import f1').timeit(number=N))
print(Timer(stmt='f2()', setup='from __main__ import f2').timeit(number=N))
print(Timer(stmt='f3()', setup='from __main__ import f3').timeit(number=N))
print(Timer(stmt='f4()', setup='from __main__ import f4').timeit(number=N))
Output
0.256800375467
0.265079360645
0.260599391822
0.492333103788
Comparing to iterkeys
' dictiter_iternextkey
and itervalues
' dictiter_iternextvalue
, iteritems
'dictiter_iternextitem
has additional parts.
if (result->ob_refcnt == 1) {
Py_INCREF(result);
Py_DECREF(PyTuple_GET_ITEM(result, 0));
Py_DECREF(PyTuple_GET_ITEM(result, 1));
} else {
result = PyTuple_New(2);
if (result == NULL)
return NULL;
}
di->len--;
key = ep[i].me_key;
value = ep[i].me_value;
Py_INCREF(key);
Py_INCREF(value);
PyTuple_SET_ITEM(result, 0, key);
PyTuple_SET_ITEM(result, 1, value);
I think that tuple creation could decrease the performance.
Python indeed tends to reuse tuples.
tupleobject.c
shows
/* Speed optimization to avoid frequent malloc/free of small tuples */
#ifndef PyTuple_MAXSAVESIZE
#define PyTuple_MAXSAVESIZE 20 /* Largest tuple to save on free list */
#endif
#ifndef PyTuple_MAXFREELIST
#define PyTuple_MAXFREELIST 2000 /* Maximum number of tuples of each size to save */
#endif
This optimization just means Python does not build some tuples from scratch. But there is still a lot of work to be done.
Case: dict
If OrderedDict
is replaced by dict
, I think the second solution is slightly better in general.
Python dictionaries are implemented using hash tables. So the lookup is fast. The average time complexity of lookup is O(1), while the worst is O(n)1.
The average time complexity of your first solution is the same as the time complexity of your second solution. They are both O(n).
Therefore, the second solution has no advantages or is even slower sometimes, especially when the input data is small.
In that case, the extra cost caused by iteritems
could not be compensated.
from collections import OrderedDict
from operator import itemgetter
from timeit import Timer
from random import randint, random
N = 100000
xs = [('a', 10), ('b', 9), ('c', 4), ('d', 7), ('e', 3), ('f', 0), ('g', -5), ('h', 9)]
od = OrderedDict(xs)
d = dict(xs)
def f1od_min():
return min(od, key=od.get)
def f2od_min():
return min(od.iteritems(), key=itemgetter(1))[0]
def f1d_min():
return min(d, key=d.get)
def f2d_min():
return min(d.iteritems(), key=itemgetter(1))[0]
def f1od():
for k in od: pass
def f2od():
for t in od.iteritems(): pass
def f1d():
for k in d: pass
def f2d():
for t in d.iteritems(): pass
print 'min'
print(Timer(stmt='f1od_min()', setup='from __main__ import f1od_min').timeit(number=N))
print(Timer(stmt='f2od_min()', setup='from __main__ import f2od_min').timeit(number=N))
print(Timer(stmt='f1d_min()', setup='from __main__ import f1d_min').timeit(number=N))
print(Timer(stmt='f2d_min()', setup='from __main__ import f2d_min').timeit(number=N))
print
print 'traverse'
print(Timer(stmt='f1od()', setup='from __main__ import f1od').timeit(number=N))
print(Timer(stmt='f2od()', setup='from __main__ import f2od').timeit(number=N))
print(Timer(stmt='f1d()', setup='from __main__ import f1d').timeit(number=N))
print(Timer(stmt='f2d()', setup='from __main__ import f2d').timeit(number=N))
Output
min
0.398274431527
0.813040903243
0.185168156847
0.249574387248 <-- dict/the second solution
traverse
0.251634216081
0.642283865687
0.0565099754298
0.0958057518483
Then replace N
and xs
by
N = 50
xs = [(x, randint(1, 100)) for x in range(100000)]
Output
min
1.5148923257
3.47020082161
0.712828585756
0.70823812803 <-- dict/the second solution
traverse
0.975989336634
2.92283956481
0.127676073356
0.253622387762
Now replace N
and xs
by
N = 10
xs = [(random(), random()) for x in range(1000000)]
Output
min
6.23311265817
10.702984667
4.32852708934
2.87853889251 <-- dict/the second solution
traverse
2.06231783648
9.49360449443
1.33297618831
1.73723008092
Finally, the second solution starts to shine.
The worse case for the first solution: hash collision
Let
N = 10000
xs = [(2 ** (32 + x) - 2 ** x + 1, 1) for x in range(100)]
# hash(2 ** (32 + x) - 2 ** x + 1) is always 1
Output
min
2.44175265292 <-- lookup is slow
2.76424538594 <-- lookup is slow
2.26508627493 <-- lookup is slow
0.199363955475
traverse
0.200654482623
2.59635966303 <-- lookup is slow
0.0454684184722
0.0733798569371
1 The Average Case times listed for dict objects assume that the hash function for the objects is sufficiently robust to make collisions uncommon. The Average Case assumes the keys used in parameters are selected uniformly at random from the set of all keys. See TimeComplexity.