Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

python - zipfile.BadZipFile: Bad CRC-32 when extracting a password protected .zip & .zip goes corrupt on extract

I am trying to extract a password protected .zip which has a .txt document (Say Congrats.txt for this case). Now Congrats.txt has text in it thus its not 0kb in size. Its placed in a .zip (For the sake of the thread lets name this .zip zipv1.zip) with the password dominique for the sake of this thread. That password is stored among other words and names within another .txt (Which we'll name it as file.txt for the sake of this question).

Now if I run the code below by doing python Program.py -z zipv1.zip -f file.txt (Assuming all these files are in the same folder as Program.py) my program displays dominique as the correct password for the zipv1.zip among the other words/passwords in file.txt and extracts the zipv1.zip but the Congrats.txt is empty and has the size of 0kb.

Now my code is as follows:

import argparse
import multiprocessing
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of file.txt.")
args = parser.parse_args()


def extract_zip(zip_filename, password):
    try:
        zip_file = zipfile.ZipFile(zip_filename)
        zip_file.extractall(pwd=password)
        print(f"[+] Password for the .zip: {password.decode('utf-8')} 
")
    except:
        # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
        pass


def main(zip, file):
    if (zip == None) | (file == None):
        # If the args are not used, it displays how to use them to the user.
        print(parser.usage)
        exit(0)
    # Opens the word list/password list/dictionary in "read binary" mode.
    txt_file = open(file, "rb")
    # Allows 8 instances of Python to be ran simultaneously.
    with multiprocessing.Pool(8) as pool:
        # "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
        pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])


if __name__ == '__main__':
    main(args.zip, args.file)

However if I another zip (zipv2.zip) with the same method as zipv1.zip with only difference being Congrats.txt is in a folder which the folder is zipped alongside Congrats.txt I do get the same results as zipv1.zip but this time Congrats.txt extracted along the folder it was in, and Congrats.txt was intact; the text in it and the size of it was intact.

So to solve this I tried reading zipfile's documentation where I found out that if a password doesn't match the .zip it throws a RuntimeError. So I did changed except: in the code to except RuntimeError: and got this error when trying to unzip zipv1.zip:

(venv) C:UsersUSERDocumentsJetbrainsPyCharmProgram>Program.py -z zipv1.zip -f file.txt
[+] Password for the .zip: dominique

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 121, in worker
result = (True, func(*args, **kwds))
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libshutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 876, in read
data = self._read1(n)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 966, in _read1
self._update_crc(data)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 38, in <module>
main(args.zip, args.file)
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'

The same results happpen though; password was found in file.txt, zipv1.zip was extracted but Congrats.txt was empty and 0kb in size. So I ran the program again, but for zipv2.zip this time and got this as a result:

(venv) C:UsersUSERDocumentsJetbrainsPyCharmProgram>Program.py -z zipv2.zip -f file.txt
[+] Password for the .zip: dominique

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 121, in worker
result = (True, func(*args, **kwds))
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libshutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 876, in read
data = self._read1(n)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 966, in _read1
self._update_crc(data)
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libzipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 38, in <module>
main(args.zip, args.file)
  File "C:UsersUSERDocumentsJetbrainsPyCharmProgramProgram.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "C:UsersUSERAppDataLocalProgramsPythonPython37libmultiprocessingpool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'

Again, same results; where the folder was extracted successfully and Congrats.txt was also extracted with the text inside it and the size of it was intact.

I did take a look at this similar thread, as well as this thread but they were no help. I also checked zipfile's documentation but it wasn't helpful regarding the issue.

Edit

Now after implementing with zipfile.ZipFile(zip_filename, 'r') as zip_file: for some unknown and weird reason; the program can read/process a small word list/password list/dictionary but can't if its large(?).

What I mean by that is that say a .txt document is present in zipv1.zip; named Congrats.txt with the text You have cracked the .zip!. The same .txt is present in zipv2.zip aswell, but this time placed in a folder named ZIP Contents then zipped/password protected. The password is dominique for both of the zips.

Do note that each .zip was generated using Deflate compression method and ZipCrypto encryption in 7zip.

Now that password is in Line 35 (35/52 lines)John The Ripper Jr.txt and in Line 1968 for John The Ripper.txt (1968/3106 lines).

Now if you do python Program.py -z zipv1 -f "John The Ripper Jr.txt" in your CMD (or IDE of your choice); it will create a folder named Extracted and place Congrats.txt with the sentence we previously set. Same goes for zipv2 but Congrats.txt will be in ZIP Contents folder which is inside the Extracted folder. No trouble extracting the .zips in this instance.

But if you try the same thing with John The Ripper.txt i.e python Program.py -z zipv1 -f "John The Ripper.txt" in your CMD (or IDE of your choice) it will create the Extracted folder both of the zips; just like John The Ripper Jr.txt but this time Congrats.txt will be empty for both of them for some unknown reason.

My code and all necessary files are as follows:

import argparse
import multiprocessing
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack.", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
args = parser.parse_args()


def extract_zip(zip_filename, password):
    try:
        with zipfile.ZipFile(zip_filename, 'r') as zip_file:
            zip_file.extractall('Extracted', pwd=password)
            print(f"[+] Password for the .zip: {password.decode('utf-8')} 
")
    except:
        # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
        pass


def main(zip, file):
    if (zip == None) | (file == None):
        # If the args are not used, it displays how to use them to the user.
        print(parser.usage)
        exit(0)
    # Opens the word list/password list/dictionary in "read binary" mode.
    txt_file = open(file, "rb")
    # Allows 8 instances of P

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Sorry for the long pause ... It seems you've got yourself into a bit of a pickle.

Recap:

  • Working on a password protected .zip file
  • Brute force (ciobaneste) is attempted, using passwords from a file
  • The correct password is in the (previous step) file, but in spite of that, some files aren't properly extracted

1. Investigation

The scenario is complex (quite far away from an MCVE, I'd say), there are many things that can be blamed for the behavior.

Starting with the zipv1.zip / zipv2.zip mismatch. On a closer look, it appears that, zipv2 is messed up as well. If things are easy to spot for zipv1 (Congrats.txt being the only file), for zipv2, "ZIP Contents/Black-Large.png" is being 0 sized.
It is reproducible with any file, and more: it applies to 1st entry (which is not a dir) returned by zf.namelist.

So, things start to get a little bit clearer:

  • File contents is being unpacked, due to dominique being present in the password file (don't know what happens til that point)
  • At a later point, the .zip's 1st entry is truncated to 0 bytes

Looking at the exceptions thrown when attempting to extract files using a wrong password, there are 3 types (out of which the last 2 can be grouped together):

  1. RuntimeError: Bad password for file ...
  2. Others:
    • zlib.error: Error -3 while decompressing data ...
    • zipfile.BadZipFile: Bad CRC-32 for file ...

I created an archive file of my own. For consistency's sake, I'll be using it from now on, but everything would apply to any other file as well.

  • Content:
    • DummyFile0.zip (10 bytes) - containing: 0123456789
    • DummyFile1.zip (10 bytes) - containing: 0000000000
    • DummyFile2.zip (10 bytes) - containing: AAAAAAAAAA
  • Archived the 3 files with Total Commander (9.21a) internal zip packer, password protecting it with dominique (zip2.0 encryption). The resulting archive (named it arc0.zip (but name is not relevant)), is 392 bytes long

code.py:

#!/usr/bin/env python3

import sys
import os
import zipfile


def main():
    arc_name = sys.argv[1] if len(sys.argv) > 1 else "./arc0.zip"
    pwds = [
        #b"dominique",
        #b"dickhead",
        b"coco",
    ]
    pwds = [item.strip() for item in open("orig/John The Ripper.txt.orig", "rb").readlines()]
    print("Unpacking (password protected: dominique) {:s},"
          " using a list of predefined passwords ...".format(arc_name))
    if not os.path.isfile(arc_name):
        raise SystemExit("Archive file must exist!
Exiting.")
    faulty_pwds = list()
    good_pwds = list()
    with zipfile.ZipFile(arc_name, "r") as zip_file:
        print("Zip names: {:}
".format(zip_file.namelist()))
        for idx, pwd in enumerate(pwds):
            try:
                zip_file.extractall("Extracted", pwd=pwd)
            except:
                exc_cls, exc_inst, exc_tb = sys.exc_info()
                if exc_cls != RuntimeError:
                    print("Exception caught when using password ({:d}): [{:}] ".format(idx, pwd))
                    print("    {:}: {:}".format(exc_cls, exc_inst))
                    faulty_pwds.append(pwd)
            else:
                print("Success using password ({:d}): [{:}] ".format(idx, pwd))
                good_pwds.append(pwd)
    print("
Faulty passwords: {:}
Good passwords: {:}".format(faulty_pwds, good_pwds))


if __name__ == "__main__":
    print("Python {:s} on {:s}
".format(sys.version, sys.platform))
    main()

Output:

[cfati@CFATI-5510-0:e:WorkDevStackOverflowq054532010]> "e:WorkDevVEnvspy_064_03.06.08_test0Scriptspython.exe" code.py arc0.zip
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32

Unpacking (password protected: dominique) arc0.zip, using a list of predefined passwords ...
Zip names: ['DummyFile0.txt', 'DummyFile1.txt', 'DummyFile2.txt']

Exception caught when using password (1189): [b'mariah']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
Exception caught when using password (1446): [b'zebra']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
Exception caught when using password (1477): [b'1977']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
Success using password (1967): [b'dominique']
Exception caught when using password (2122): [b'hank']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
Exception caught when using password (2694): [b'solomon']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid distance code
Exception caught when using password (2768): [b'target']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid block type
Exception caught when using password (2816): [b'trish']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set
Exception caught when using password (2989): [b'coco']
    <class 'zlib.error'>: Error -3 while decompressing data: invalid stored block lengths

Faulty passwords: [b'mariah', b'zebra', b'1977', b'hank', b'solomon', b'target', b'trish', b'coco']
Good passwords: [b'dominique']

Looking at ZipFile.extractall code, it tries to extract all the members. The 1st raises an exception, so it starts to be clearer why it behaves the way it does. But why the behavioral difference, when attempting to extract items using 2 wrong passwords?
As seen in the tracebacks of the 2 different thrown exception types, the answer lies somewhere at the end of ZipFile.open.

After more investigations, it turns out it's because of a

2. Collision determined by zip encryption weakness

According to [UT.CS]: dmitri-report-f15-16.pdf - Password-based encryption in ZIP files ((last) emphasis is mine):

3.1 Traditional PKWARE encryption

The original encryption scheme, commonly referred to as the PKZIP cipher, was designed by Roger Schaffely [1]. In [5] Biham and Kocher showed that the cipher is weak and demonstrated an attack requiring 13 bytes of plaintext. Further attacks have been developed, some of which require no user provided plaintext at all [6]. The PKZIP cipher is essentially a stream cipher, i.e. input is encrypted by generating a pseudo- random key stream and XOR-ing it with the plaintext. The internal state of the cipher consists of three 32-bit words: key0, key1 and key2. These are initialized to 0x12345678, 0x23456789 and 0x34567890, respectively. A core step of the algorithm involves updating the three keys using a single byte of input...

...

Before encrypting a file in the archive, 12 random bytes are first prepended to its compressed contents and the resulting bytestream is then encrypted. Upon decryption, the first 12 bytes need to be discarded. According to the specification, this is done in order to render a plaintext attack on the data ineffective. The specification also states that out of the 12 prepended bytes, only the first 11 are actually random, the last byte is equal to the high order byte of the CRC-32 of the uncompressed contents of the file. This gives the ability to quickly verify whether a given password is correct by comparing the last byte of the decrypted 12 byte header to the high order byte of the actual CRC-32 value that is included in the local file header. This can be done before decrypting the rest of the file.

Other references:

The algorithm weakness: due to the fact that differentiation is done on one byte only, for 256 different (and carefully chosen) wrong passwords, there will be one (at least) that will generate the same number as the correct password.

The algorithm discards most of the wrong passwords, but there are some that it doesn't.

Going back: when a file is attempted to be extracted using a password:

  • If the "hash" computed on the file cipher's last byte is different than file CRC's high order byte, an exception is thrown
  • But, if they are equal:
    • A new file stream is open for writing (emptying the file if already existing)
    • The decompression is attempted:
      • For wrong passwords (that have passed the above check), the decompression will fail (but the file is already emptied)

As seen from the output above, for my (.zip) file there are 8 passwords that mess it up. Note that:

  • For each archive file the result differs
  • The member file name and content are relevant (at least for the 1st one). Changing any of those will yield different results (for the "same" archive file)

Here's a test based on data from my .zip file:

>>> import zipfile
>>>
>>> zd_coco = zipfile._ZipDecrypter(b"coco")
>>> zd_dominique = zipfile._ZipDecrypter(b"dominique")
>>> zd_other = zipfile._ZipDecrypter(b"other")
>>> cipher = b'xd1x86y ^xd77gRzZxee'  # Member (1st) file cipher: 12 bytes starting from archive offset 44
>>>
>>> crc = 2793719750  # Member (1st) file CRC - archive bytes: 14 - 17
>>> hex(crc)
'0xa684c7c6'
>>> for zd in (zd_coco, zd_dominique, zd_other):
...     print(zd, [hex(zd(c)) for c in cipher])
...
<zipfile._ZipDecrypter object at 0x0000021E8DA2E0F0> ['0x1f', '0x58', '0x89', '0x29', '0x89', '0xe', '0x32', '0xe7', '0x2', '0x31', '0x70', '0xa6']
<zipfile._ZipDecrypter object at 0x0000021E8DA2E160> ['0xa8', '0x3f', '0xa2', '0x56', '0x4c', '0x37', '0xbb', '0x60', '0xd3', '0x5e', '0x84', '0xa6']
<zipfile._ZipDecrypter object at 0x0000021E8DA2E128> ['0xeb', '0x64', '0x36', '0xa3', '0xca', '0x46', '0x17', '0x1a', '0xfb', '0x6d', '0x6c', '0x4e']
>>>  # As seen, the last element of the first 2 arrays (coco and dominique) is 0xA6 (166), which is the same as the first byte of the CRC

I did some tests with other unpacking eng


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...