When choosing purely between attrgetter('attributename')
and lambda o: o.attributename
as a sort key, then using attrgetter()
is the faster option of the two.
Remember that the key function is only applied once to each element in the list, before sorting, so to compare the two we can use them directly in a time trial:
>>> from timeit import Timer
>>> from random import randint
>>> from dataclasses import dataclass, field
>>> @dataclass
... class Foo:
... bar: int = field(default_factory=lambda: randint(1, 10**6))
...
>>> testdata = [Foo() for _ in range(1000)]
>>> def test_function(objects, key):
... [key(o) for o in objects]
...
>>> stmt = 't(testdata, key)'
>>> setup = 'from __main__ import test_function as t, testdata; '
>>> tests = {
... 'lambda': setup + 'key=lambda o: o.bar',
... 'attrgetter': setup + 'from operator import attrgetter; key=attrgetter("bar")'
... }
>>> for name, tsetup in tests.items():
... count, total = Timer(stmt, tsetup).autorange()
... print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)")
...
lambda: 130.495 microseconds (2000 repetitions)
attrgetter: 92.850 microseconds (5000 repetitions)
So applying attrgetter('bar')
1000 times is roughly 40 μs faster than a lambda
. That's because calling a Python function has a certain amount of overhead, more than calling into a native function such as produced by attrgetter()
.
This speed advantage translates into faster sorting too:
>>> def test_function(objects, key):
... sorted(objects, key=key)
...
>>> for name, tsetup in tests.items():
... count, total = Timer(stmt, tsetup).autorange()
... print(f"{name:>10}: {total / count * 10 ** 6:7.3f} microseconds ({count} repetitions)")
...
lambda: 218.715 microseconds (1000 repetitions)
attrgetter: 169.064 microseconds (2000 repetitions)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…