Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.5k views
in Technique[技术] by (71.8m points)

python - Pandas boolean comparisson on dataframe

I am getting the error when I make a comparison on a single element in a dataframe, but I don't understand why.

I have a dataframe df with timeseries data for a number of customers, with some null values within it:

df.head()
                    8143511  8145987  8145997  8146001  8146235  8147611  
2012-07-01 00:00:00      NaN      NaN      NaN      NaN      NaN      NaN   
2012-07-01 00:30:00    0.089      NaN    0.281    0.126    0.190    0.500   
2012-07-01 01:00:00    0.090      NaN    0.323    0.141    0.135    0.453   
2012-07-01 01:30:00    0.061      NaN    0.278    0.097    0.093    0.424   
2012-07-01 02:00:00    0.052      NaN    0.278    0.158    0.170    0.462  

In my script, the line if pd.isnull(df[[customer_ID]].loc[ts]): generates an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

However, if I put a breakpoint on the line of script, and when the script stops I type this into the console:

pd.isnull(df[[customer_ID]].loc[ts])

the output is:

8143511    True
Name: 2012-07-01 00:00:00, dtype: bool

If I allow the script to continue from that point, the error is generated immediately.

If the boolean expression can be evaluated and has the value True, why does it generate an error in the if expression? This makes no sense to me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem lies in the if statement.

When you code

if this:
    print(that)

this will be evaluated as bool(this). And that better come back as True or False.

However, you did:

if  pd.isnull(df[[customer_ID]].loc[ts]):
    pass  # idk what you did here because you didn't say... but doesn't matter

Also, you stated that pd.isnull(df[[customer_ID]].loc[ts]) evaluated to:

8143511    True
Name: 2012-07-01 00:00:00, dtype: bool

Does that look like a True or False?
What about bool(pd.isnull(df[[customer_ID]].loc[ts]))?

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So the lesson is: A pd.Series cannot be evaluated as True or False

It is, however, a pd.Series of Trues and Falses.

And that is why it doesn't work.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...