python - How to keep leading zeros in a column when reading CSV with Pandas?

Question

Welcome To Ask or Share your Answers For Others

python - How to keep leading zeros in a column when reading CSV with Pandas?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to keep leading zeros in a column when reading CSV with Pandas?

I am importing study data into a Pandas data frame using read_csv.

My subject codes are 6 numbers coding, among others, the day of birth. For some of my subjects this results in a code with a leading zero (e.g. "010816").

When I import into Pandas, the leading zero is stripped of and the column is formatted as int64.

Is there a way to import this column unchanged maybe as a string?

I tried using a custom converter for the column, but it does not work - it seems as if the custom conversion takes place before Pandas converts to int.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:30:22+0000

As indicated in this question/answer by Lev Landau, there could be a simple solution to use converters option for a certain column in read_csv function.

converters={'column_name': lambda x: str(x)}

You can refer to more options of read_csv funtion in pandas.io.parsers.read_csv documentation.

Lets say I have csv file projects.csv like below:

project_name,project_id
Some Project,000245
Another Project,000478

As for example below code is triming leading zeros:

import csv
from pandas import read_csv

dataframe = read_csv('projects.csv')
print dataframe

Result:

me@ubuntu:~$ python test_dataframe.py 
      project_name  project_id
0     Some Project         245
1  Another Project         478
me@ubuntu:~$

Solution code example:

import csv
from pandas import read_csv

dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)})
print dataframe

Required result:

me@ubuntu:~$ python test_dataframe.py 
      project_name project_id
0     Some Project     000245
1  Another Project     000478
me@ubuntu:~$

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

Categories

python - How to keep leading zeros in a column when reading CSV with Pandas?

python - How to keep leading zeros in a column when reading CSV with Pandas?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags