I am going through the 'Python for Data Analysis' book and having trouble in the 'Example: 2012 Federal Election Commision Database' section reading the data to a DataFrame. The trouble is that one of the columns of data is always being set as the index column, even when the index_col argument is set to None.
Here is the link to the data : http://www.fec.gov/disclosurep/PDownload.do.
Here is the loading code (to save time in the checking, I set the nrows=10):
import pandas as pd
fec = pd.read_csv('P00000001-ALL.csv',nrows=10,index_col=None)
To keep it short I am excluding the data column outputs, but here is my output (please not the Index values):
In [20]: fec
Out[20]:
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, C00410118 to C00410118
Data columns:
...
dtypes: float64(4), int64(3), object(11)
And here is the book's output (again with data columns excluded):
In [13]: fec = read_csv('P00000001-ALL.csv')
In [14]: fec
Out[14]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1001731 entries, 0 to 1001730
...
dtypes: float64(1), int64(1), object(14)
The Index values in my output are actually the first column of data in the file, which is then moving all the rest of the data to the left by one. Would anyone know how to prevent this column of data to be listed as an index? I would like to have the index just +1 increasing integers.
I am fairly new to python and pandas, so I apologize for any inconvenience. Thanks.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…