So I'm reading in a station codes csv file from NOAA which looks like this:
"USAF","WBAN","STATION NAME","CTRY","FIPS","STATE","CALL","LAT","LON","ELEV(.1M)","BEGIN","END"
"006852","99999","SENT","SW","SZ","","","+46817","+010350","+14200","",""
"007005","99999","CWOS 07005","","","","","-99999","-999999","-99999","20120127","20120127"
The first two columns contain codes for weather stations and sometimes they have leading zeros. When pandas imports them without specifying a dtype they turn into integers. It's not really that big of a deal because I can loop through the dataframe index and replace them with something like "%06d" % i
since they are always six digits, but you know... that's the lazy mans way.
The csv is obtained using this code:
file = urllib.urlopen(r"ftp://ftp.ncdc.noaa.gov/pub/data/inventories/ISH-HISTORY.CSV")
output = open('Station Codes.csv','wb')
output.write(file.read())
output.close()
which is all well and good but when I go and try and read it using this:
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': np.str, 'WBAN': np.str})
or
import pandas as pd
df = pd.io.parsers.read_csv("Station Codes.csv",dtype={'USAF': str, 'WBAN': str})
I get a nasty error message:
File "C:Python27libsite-packagespandas-0.11.0-py2.7-win32.eggpandasioparsers.py", line 401, in parser
_f
return _read(filepath_or_buffer, kwds)
File "C:Python27libsite-packagespandas-0.11.0-py2.7-win32.eggpandasioparsers.py", line 216, in _read
return parser.read()
File "C:Python27libsite-packagespandas-0.11.0-py2.7-win32.eggpandasioparsers.py", line 633, in read
ret = self._engine.read(nrows)
File "C:Python27libsite-packagespandas-0.11.0-py2.7-win32.eggpandasioparsers.py", line 957, in read
data = self._reader.read(nrows)
File "parser.pyx", line 654, in pandas._parser.TextReader.read (pandassrcparser.c:5931)
File "parser.pyx", line 676, in pandas._parser.TextReader._read_low_memory (pandassrcparser.c:6148)
File "parser.pyx", line 752, in pandas._parser.TextReader._read_rows (pandassrcparser.c:6962)
File "parser.pyx", line 837, in pandas._parser.TextReader._convert_column_data (pandassrcparser.c:7898)
File "parser.pyx", line 887, in pandas._parser.TextReader._convert_tokens (pandassrcparser.c:8483)
File "parser.pyx", line 953, in pandas._parser.TextReader._convert_with_dtype (pandassrcparser.c:9535)
File "parser.pyx", line 1283, in pandas._parser._to_fw_string (pandassrcparser.c:14616)
TypeError: data type not understood
It's a pretty big csv (31k rows) so maybe that has something to do with it?
See Question&Answers more detail:
os