UPDATED based on Lennart Regebro's answer
Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:
remove = []
for k, v in dict_.items():
if condition(k, v):
remove.append(k)
continue
# do other things you need to do in this loop
for k in remove:
del dict_[k]
The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it's not an issue. However, this approach requires some extra coding, so it's not very popular.
The popular dict comprehension approach:
dict_ = {k : v for k, v in dict_ if not condition(k, v)}
for k, v in dict_.items():
# do other things you need to do in this loop
results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often.
A much better approach is to copy the keys only rather than whole dictionary:
for k in list(dict_.keys()):
if condition(k, dict_[k]):
del dict_[k]
continue
# do other things you need to do in this loop
(Note that all code examples are in Python 3, so keys()
, items()
returns a view, not a copy.)
In most cases, it won't hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you're doing in the loop) is usually greater than the time to add one key to a list.
Still, I am wondering if it's possible to avoid even that with a custom dictionary that allows deletions while iterating:
for k, v in dict_.items():
if condition(k, v):
del dict_[k]
continue
# do other things you need to do in this loop
Perhaps an iterator could always look ahead, so that when the __next__
is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause StopIteration
exception raised whenever __next__
is called again.
If the element the iterator tries to advance to turns out to be deleted, it's fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously.
Are there any problems with this approach?
One problem is that I'm not sure it can be done with no material overhead compared to the existing dict
; otherwise, it would be faster to use the list(dict_)
approach!
UPDATE:
I tried all the versions. I don't report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be list(dict_)
. After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it's also proportional to the list size, is likely to be bigger.
I really like all the ideas, but since I have to select only one, I'm accepting the context manager solution since it allows to use the dictionary as either normal or "enhanced" with very small code changes.
See Question&Answers more detail:
os