Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
500 views
in Technique[技术] by (71.8m points)

python - Inserting list into a cell - why does loc ACTUALLY work here?

We are aware that the standard method of setting a single cell is using at or iat. However, I noticed some interesting behaviour I was wondering if anyone could rationalise.

In solving this question, I come across some weird behaviour of loc.

# Setup.

pd.__version__
# '0.24.0rc1'

df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df
    A       B
0  12  [a, b]
1  23  [c, d]

To set cell (1, 'B'), it suffices to do this with at, like df.at[1, 'B'] = .... But with loc, I initially tried this, which did not work:

df.loc[1, 'B'] = ['m', 'n', 'o', 'p'] 
# ValueError: Must have equal len keys and value when setting with an iterable

So, I tried (which also failed)

df.loc[1, 'B'] = [['m', 'n', 'o', 'p']]
# ValueError: Must have equal len keys and value when setting with an ndarray

I thought loc would also somehow be able to take nested lists here. In a bizarre turn of events, this code worked:

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df

    A             B
0  12        [a, b]
1  23  [m, n, o, p]

Why does loc work this way? Additionally, if you add another element to any of the lists, it flops:

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p', 'q']]
# ValueError: Must have equal len keys and value when setting with an iterable

Empty lists don't work either. It seems pointless to have to nest each element in its own list.

Why does loc do this? Is this documented behaviour, or a bug?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This occurs because loc does a bunch of checking for all the many usecases which it supports. (Note: The history was that loc and iloc were created to remove ambiguity of ix, way back in 2013 v0.11, but even today there's still a lot of ambiguity in loc.)

In this case df.loc[1, 'B'] can either return:

  • a single element (as in this case, when there is a unique index/column for 1/'B').
  • a Series (if EITHER one of the 1/'B' appears in index/columns multiple times).
  • a DataFrame (if BOTH 1/'B' appear in index/columns multiple times).

Aside: iloc suffers the same issue in this case, even though it's always going to be the first case, but that may be because loc and iloc share this assignment code.

So that pandas needs to support all of those cases for assignment!

An early part of the assignment logic converts the list (of lists) into a numpy array:

In [11]: np.array(['m', 'n', 'o', 'p']).shape
Out[11]: (4,)

In [12]: np.array([['m', 'n', 'o', 'p']]).shape
Out[12]: (1, 4)

So you can't just pass the list of lists and expect to get the right array. Instead you could to explictly set into an object array:

In [13]: a = np.empty(1, dtype=object)

In [14]: a[0] = ['m', 'n', 'o', 'p']

In [15]: a
Out[15]: array([list(['m', 'n', 'o', 'p'])], dtype=object)

Now you can use this in the assignment:

In [16]: df.loc[0, 'B'] = a

In [17]: df
Out[17]:
    A             B
0  12  [m, n, o, p]
1  23        [c, d]

It's still not ideal, but to reiterate there are sooo many edge cases in loc and iloc, that the solution is to be as explicit as possible to avoid them (use at here). And more generally, as you know, avoid using lists inside a DataFrame!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...