The company I work for distributes document assembly software that uses the python-docx library. The software runs a function on every generated document that opens the document and does a simple search and replace for characters that weren't escaped properly (namely "& amp;" -> "&").
FYI The actual document assembly uses python-docx-template. However, the error happens after the document has already been assembled and the error is triggered by the search-and-replace function, which only uses python-docx.
Recently, we've had a few cases where documents are failing to generate on client deployments. They're throwing an error on this line where the document object is instantiated:
doc = Document(docx=Path(doc_path))
We've seen two errors:
raise BadZipFile("Bad magic number for file header")
and
raise EOFError
The software is widely used and we've never had this issue before. We can't reproduce it in our test environments. The error has only started appearing in the past week but has shown up for several clients after they were updated. The software will fail to generate a particular document some number of times but will succeed after a few tries.
We've only seen it happen with one document in particular, but all documents use the same search and replace function, and like I said the error is only intermittent with the problem document.
There have been no changes in code to this search and replace function and I can't think of any other meaningful difference to our doc assembly process that would explain this.
I'm having a lot of trouble finding info on what could cause this specifically with the python-docx library. Is this a sign that the generated document is corrupted? If anyone is able to shed some light on possible causes that would be very helpful!
Here's the stack trace for both errors:
Bad magic number...
File "/home/user/app/application/document_assembly/core_da.py", line 524, in translate_ampersands
doc = Document(docx=Path(doc_path))
File "/home/user/app-venv/lib/python3.6/site-packages/docx/api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 36, in from_file
phys_reader, pkg_srels, content_types
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 104, in _walk_phys_parts
part_srels = PackageReader._srels_for(phys_reader, partname)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 83, in _srels_for
rels_xml = phys_reader.rels_xml_for(source_uri)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 129, in rels_xml_for
rels_xml = self.blob_for(source_uri.rels_uri)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 108, in blob_for
return self._zipf.read(pack_uri.membername)
File "/usr/lib/python3.6/zipfile.py", line 1337, in read
with self.open(name, "r", pwd) as fp:
File "/usr/lib/python3.6/zipfile.py", line 1396, in open
raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header
EOFError
File "/home/user/app/application/document_assembly/core_da.py", line 524, in translate_ampersands
doc = Document(docx=Path(doc_path))
File "/home/user/app-venv/lib/python3.6/site-packages/docx/api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 36, in from_file
phys_reader, pkg_srels, content_types
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 110, in _walk_phys_parts
for partname, blob, reltype, srels in next_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 105, in _walk_phys_parts
blob = phys_reader.blob_for(partname)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 108, in blob_for
return self._zipf.read(pack_uri.membername)
File "/usr/lib/python3.6/zipfile.py", line 1338, in read
return fp.read()
File "/usr/lib/python3.6/zipfile.py", line 858, in read
buf += self._read1(self.MAX_N)
File "/usr/lib/python3.6/zipfile.py", line 940, in _read1
data += self._read2(n - len(data))
File "/usr/lib/python3.6/zipfile.py", line 975, in _read2
raise EOFError
EOFError
question from:
https://stackoverflow.com/questions/65946376/python-docx-error-opening-file-bad-magic-number-for-file-header-eoferror