Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
162 views
in Technique[技术] by (71.8m points)

Python > Pandas > Reading XLS/CSV Files

Hope you are well, and thank you in advance.

So I have a '.xls' file and what to read the data into a dataframe. For other xls files I have successfully used the

df1a = pd.read_excel(sr_file)

For this '.xls' file it doesn't work due to being an unsupported format. Full error:

Traceback (most recent call last):
  File "pyt_AST_Recon.py", line 713, in <module>
    main()
  File "pyt_AST_Recon.py", line 683, in main
    subredFile_loc(sr_file, sr_date)
  File "pyt_AST_Recon.py", line 329, in subredFile_loc
    df1a = pd.read_excel(sr_file)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasutil\_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioexcel\_base.py", line 310, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioexcel\_base.py", line 819, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioexcel\_xlrd.py", line 21, in __init__
    super().__init__(filepath_or_buffer)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioexcel\_base.py", line 359, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioexcel\_xlrd.py", line 36, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagesxlrd\__init__.py", line 162, in open_workbook
    ragged_rows=ragged_rows,
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagesxlrdook.py", line 91, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagesxlrdook.py", line 1271, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagesxlrdook.py", line 1265, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Trade da'

On further inspection, I believe this is caused because the file underneath is actually a '.csv' file and not a '.xls'. However, when I use read_csv I get an error.

When I convert the ending from '.xls' to '.csv' I get a horrible csv file:

| Header One Header Two Header Three Header Four |  |  |  |
| --------                                            | -----------| ------- |------- |
| L1 Data 1    L1 Data 2   | L1 Data 3    |  L1 Data 4      |   |
| L2 Data 1   | L2 Data 2  | L2 Data 3  | L2 Data 4             |
| L3 Data 1   | L3 Data 2  | L3 Data 3  | L3 Data 4             |

When using read_csv I get the following error:

Traceback (most recent call last):
  File "pyt_AST_Recon.py", line 713, in <module>
    main()
  File "pyt_AST_Recon.py", line 683, in main
    subredFile_loc(sr_file, sr_date)
  File "pyt_AST_Recon.py", line 329, in subredFile_loc
    df1a = pd.read_csv(sr_file)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioparsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioparsers.py", line 463, in _read
    data = parser.read(nrows)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioparsers.py", line 1154, in read
    ret = self._engine.read(nrows)
  File "C:Usersd.howellsAppDataLocalProgramsPythonPython35libsite-packagespandasioparsers.py", line 2059, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 6

Which does make some sense given the weird looking data I am seeing.

When I use the error_bad_lines to ignore the data I get the following dataframe:

| Header One    Header Two    Header Three   Header Four |  
| -------- | 
|  L1 Data 4      |  

Which obviously ignores most of the data.

This is admittedly a complete mess, and so if anyone can help it would be very much appreciated.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...