python - Pandas: Why is default column type for numeric float?

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: Why is default column type for numeric float?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: Why is default column type for numeric float?

I am using Pandas 0.18.1 with python 2.7.x. I have an empty dataframe that I read first. I see that the types of these columns are object which is OK. When I assign one row of data, the type for numeric values changes to float64. I was expecting int or int64. Why does this happen?

Is there a way to set some global option to let Pandas knows that for numeric values, treat them by default as int unless the data has a .? For example, [0 1.0, 2.], first column is int but other two are float64?

For example:

>>> df = pd.read_csv('foo.csv', engine='python', keep_default_na=False)
>>> print df.dtypes
bbox_id_seqno    object
type             object
layer            object
ll_x             object
ll_y             object
ur_x             object
ur_y             object
polygon_count    object
dtype: object
>>> df.loc[0] = ['a', 'b', 'c', 1, 2, 3, 4, 5]
>>> print df.dtypes
bbox_id_seqno     object
type              object
layer             object
ll_x             float64
ll_y             float64
ur_x             float64
ur_y             float64
polygon_count    float64
dtype: object

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:22:51+0000

It's not possible for Pandas to store NaN values in integer columns.

This makes float the obvious default choice for data storage, because as soon as missing value arises Pandas would have to change the data type for the entire column. And missing values arise very often in practice.

As for why this is, it's a restriction inherited from Numpy. Basically, Pandas needs to set aside a particular bit pattern to represent NaN. This is straightforward for floating point numbers and it's defined in the IEEE 754 standard. It's more awkward and less efficient to do this for a fixed-width integer.

Update

Exciting news in pandas 0.24. IntegerArray is an experimental feature but might render my original answer obsolete. So if you're reading this on or after 27 Feb 2019, check out the docs for that feature.

Categories

python - Pandas: Why is default column type for numeric float?

python - Pandas: Why is default column type for numeric float?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags