As indicated in this question/answer by Lev Landau, there could be a simple solution to use converters
option for a certain column in read_csv
function.
converters={'column_name': lambda x: str(x)}
You can refer to more options of read_csv
funtion in pandas.io.parsers.read_csv documentation.
Lets say I have csv file projects.csv
like below:
project_name,project_id
Some Project,000245
Another Project,000478
As for example below code is triming leading zeros:
import csv
from pandas import read_csv
dataframe = read_csv('projects.csv')
print dataframe
Result:
me@ubuntu:~$ python test_dataframe.py
project_name project_id
0 Some Project 245
1 Another Project 478
me@ubuntu:~$
Solution code example:
import csv
from pandas import read_csv
dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)})
print dataframe
Required result:
me@ubuntu:~$ python test_dataframe.py
project_name project_id
0 Some Project 000245
1 Another Project 000478
me@ubuntu:~$
Update as it helps others:
To have all columns as str, one can do this (from the comment):
pd.read_csv('sample.csv', dtype = str)
To have most or selective columns as str, one can do this:
# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str' for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…