Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
369 views
in Technique[技术] by (71.8m points)

python - list around groupby results in empty groups

I was playing around to get a better feeling for itertools groupby, so I grouped a list of tuples by the number and tried to get a list of the resulting groups. When I convert the result of groupby to a list however, I get a strange result: all but the last group are empty. Why is that? I assumed turning an iterator into a list would be less efficient but never change behavior. I guess the lists are empty because the inner iterators are traversed but when/where does that happen?

import itertools

l=list(zip([1,2,2,3,3,3],['a','b','c','d','e','f']))
#[(1, 'a'), (2, 'b'), (2, 'c'), (3, 'd'), (3, 'e'), (3, 'f')]

grouped_l = list(itertools.groupby(l, key=lambda x:x[0]))
#[(1, <itertools._grouper at ...>), (2, <itertools._grouper at ...>), (3, <itertools._grouper at ...>)]

[list(x[1]) for x in grouped_l]
[[], [], [(3, 'f')]]


grouped_i = itertools.groupby(l, key=lambda x:x[0])
#<itertools.groupby at ...>
[list(x[1]) for x in grouped_i]
[[(1, 'a')], [(2, 'b'), (2, 'c')], [(3, 'd'), (3, 'e'), (3, 'f')]]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

groupby is super lazy. Here's an illuminating demo. Let's group three a-values and four b-values, and print out what's happening:

>>> from itertools import groupby
>>> def letters():
        for letter in 'a', 'a', 'a', 'b', 'b', 'b', 'b':
            print('yielding', letter)
            yield letter

### Going through the groups WITHOUT looking at their members

Let's roll:

>>> groups = groupby(letters())
>>> 

Nothing got printed yet! So until now, groupby did nothing. What a lazy bum. Let's ask it for the first group:

>>> next(groups)
yielding a
('a', <itertools._grouper object at 0x05A16050>)

So groupby tells us that this is a group of a-values, and we could go through that _grouper object to get them all. But wait, why did "yielding a" get printed only once? Our generator is yielding three of them, isn't it? Well, that's because groupby is lazy. It did read one value to identify the group, because it needs to tell us what the group is about, i.e., that it's a group of a-values. And it offers us that _grouper object for us to get all the group's members if we want to. But we didn't ask to go through the members, so the lazy bum didn't go any further. It simply didn't have a reason to. Let's ask for the next group:

>>> next(groups)
yielding a
yielding a
yielding b
('b', <itertools._grouper object at 0x05A00FD0>)

Wait, what? Why "yielding a" when we're now dealing with the second group, the group of b-values? Well, because groupby previously stopped after the first a because that was enough to give us all we had asked for. But now, to tell us about the second group, it has to find the second group, and for this it asks our generator until it sees something other than a. Note that "yielding b" is again only printed once, even though our generator yields four of them. Let's ask for the third group:

>>> next(groups)
yielding b
yielding b
yielding b
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    next(groups)
StopIteration

Ok so there is no third group and thus groupby issues a StopIteration so the consumer (e.g., a loop or list comprehension) would know to stop. But before that, the remaining "yielding b" get printed, because groupby got off its lazy butt and walked over the remaining values in hopes to find a new group.


### Going through the groups WITH looking at their members

Let's try again, this time let's ask for the members:

>>> groups = groupby(letters())
>>> key, members = next(groups)
yielding a
>>> key
'a'

Again, groupby asked our generator for just a single value, in order to identify the group so it can tell us that it's an a-group. But this time, we'll also ask for the group members:

>>> list(members)
yielding a
yielding a
yielding b
['a', 'a', 'a']

Aha! There are the remaining "yielding a". Also, already the first "yielding b"! Even though we didn't even ask for the second group yet! But of course groupby has to go this far because we asked for the group members, so it has to keep looking until it gets a non-member. Let's get the next group:

>>> key, members = next(groups)
>>> 

Wait, what? Nothing got printed at all? Is groupby sleeping? Wake up! Oh wait... that's right... it already found out that the next group is b-values. Let's ask for all of them:

>>> list(members)
yielding b
yielding b
yielding b
['b', 'b', 'b', 'b']

Now the remaining three "yielding b" happen, because we asked for them so groupby has to get them.


### Why doesn't it work to get the group members afterwards?

Let's try it your initial way with list(groupby(...)):

>>> groups = list(groupby(letters()))
yielding a
yielding a
yielding a
yielding b
yielding b
yielding b
yielding b
>>> [list(members) for key, members in groups]
[[], ['b']]

Note that not only is the first group empty, but also, the second group only has one element (you didn't mention that). (Edit: That has changed by now, see the comments under the answer.)

Why?

Again: groupby is super lazy. It offers you those _grouper objects so you can go through each group's members. But if you don't ask to see the group members and instead just ask for the next group to be identified, then groupby just shrugs and is like "Ok, you're the boss, I'll just go find the next group".

What your list(groupby(...)) does is it asks groupby to identify all groups. So it does that. But if you then at the end ask for the members of each group, then groupby is like "Dude... I'm sorry, I offered them to you but you didn't want them. And I'm lazy, so I don't keep things around for no good reason. I can give you the last member of the last group, because I still remember that one, but for everything before that... sorry, I just don't have them anymore, you should've told me that you wanted them".

P.S. In all of this, of course "lazy" really means "efficient". Not something bad but something good!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...