python - How to convert type to bool in pandas with `None` values in the series?

Question

Welcome To Ask or Share your Answers For Others

python - How to convert type to bool in pandas with `None` values in the series?

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to convert type to bool in pandas with `None` values in the series?

Why does the value None convert to both True and False in this series?

Env:

Jupyter Notebok 6.0.3 in Jupyter Labs
Python 3.7.6

Imports:

from IPython.display import display
import pandas as pd

Converts None to True:

df_test1 = pd.DataFrame({'test_column':[0,1,None]})
df_test1['test_column'] = df_test1.test_column.astype(bool)
display(df_test1)

Converts None to False:

df_test2 = pd.DataFrame({'test_column':[0,1,None,'test']})
df_test2['test_column'] = df_test2.test_column.astype(bool)
display(df_test2)

Is this expected behavior?

question from:https://stackoverflow.com/questions/66067314/how-to-convert-type-to-bool-in-pandas-with-none-values-in-the-series

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:01:41+0000

Yes, this is expected behaviour, it leads from the initial dtype storage type of each series (column). The first input results in a series with floating point numbers, the second contains references to Python objects:

>>> pd.Series([0,1,None]).dtype
dtype('float64')
>>> pd.Series([0,1,None,'test']).dtype
dtype('O')

The float version of None is NaN, or Not a Number, which converts to True when interpreted as a boolean (as it is not equal to 0):

>>> pd.Series([0,1,None])[2]
nan
>>> bool(pd.Series([0,1,None])[2])
True

In the other case, the original None object was preserved, which converts to False:

>>> pd.Series([0,1,None,'test'])[2] is None
True
>>> bool(None)
False

So this comes down to automatic type inference, what type Pandas thinks is best suited for each column; see the DataFrame.infer_objects() method. The goal is to minimise storage requirements and operation performance; storing numbers as native 64-bit floating point values leads to faster numeric operations and a smaller memory footprint, while at the same time still being able to represent 'missing' values as NaN.

However, when you pass in a mix of numbers and strings, Panda's can't use a dedicated specialised array type and so falls back to the "Python object" type, which are references to the original Python objects.

Instead of letting Pandas guess as to what type you need, you could explicitly specify the type to be used. You could use one of the nullable integer types (which use Pandas.NA instead of NaN); converting these to booleans results in missing values converting to False:

>>> pd.Series([0,1,None], dtype=pd.Int64Dtype).astype(bool)
0    False
1     True
2    False
dtype: bool

Another option is to convert to a nullable boolean type, and so preserve the None / NaN indicators of missing data:

>>> pd.Series([0,1,None]).astype("boolean")
0    False
1     True
2     <NA>
dtype: boolean

Also see Working with missing data section in the user manual, as well as the nullable integer and nullable boolean data type manual pages.

Note that the Pandas notion of the NA value, representing missing data, is still considered experimental, which is why it is not yet the default. But if you want to 'opt in' for dataframes you just created, you can call the DataFrame.convert_dtypes() method right after creating the frame:

>>> df = pd.DataFrame({'prime_member':[0,1,None]}).convert_dtypes()
>>> df.prime_member
0       0
1       1
2    <NA>
Name: prime_member, dtype: Int64

Categories

python - How to convert type to bool in pandas with `None` values in the series?

python - How to convert type to bool in pandas with `None` values in the series?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags