python - Keeping NaNs with pandas dataframe inequalities

Question

Welcome To Ask or Share your Answers For Others

python - Keeping NaNs with pandas dataframe inequalities

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Keeping NaNs with pandas dataframe inequalities

I have a pandas.DataFrame object that contains about 100 columns and 200000 rows of data. I am trying to convert it to a bool dataframe where True means that the value is greater than the threshold, False means that it is less, and NaN values are maintained.

If there are no NaN values, it takes about 60 ms for me to run:

df >= threshold

But when I try to deal with the NaNs, the below method works, but is very slow (20 sec).

def func(x):
    if x >= threshold:
        return True
    elif x < threshold:
        return False
    else:
        return x
df.apply(lambda x: x.apply(lambda x: func(x)))

Is there a faster way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:40+0000

You can do:

new_df = df >= threshold
new_df[df.isnull()] = np.NaN

But that is different from what you will get using the apply method. Here your mask has float dtype containing NaN, 0.0 and 1.0. In the apply solution you get object dtype with NaN, False, and True.

Neither are OK to be used as a mask because you might not get what you want. IEEE says that any NaN comparison must yield False and the apply method is implicitly violates that by returning NaN!

The best option is to keep track of the NaNs separately and df.isnull() is quite fast when bottleneck is installed.

Categories

python - Keeping NaNs with pandas dataframe inequalities

python - Keeping NaNs with pandas dataframe inequalities

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags