Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
686 views
in Technique[技术] by (71.8m points)

python - Steganographer File Handling Error for non plain-text files

I've built a Python Steganographer and am trying to add a GUI to it. After my previous question regarding reading all kinds of files in Python. Since, the steganographer can only encode bytes in image. I want to add support to directly encode a file of any extension and encoding in it. For this, I am reading the file in binary and trying to encode it. It works fine for files which basically contains plain-text UTF-8 because it can easily encode .txt and .py files.

My updated code is:

from PIL import Image

import os

class StringTooLongException(Exception):
    pass

class InvalidBitValueException(Exception):
    pass

def str2bin(message):       
    binary = bin(int.from_bytes(message, 'big'))
    return binary[2:]

def bin2str(binary):
    n = int(binary, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big')

def hide(filename, message, bits=2):
    image = Image.open(filename)
    binary = str2bin(message) + '00000000'

    if (len(binary)) % 8 != 0:
        binary = '0'*(8 - ((len(binary)) % 8)) + binary

    data = list(image.getdata())

    newData = []

    if len(data) * bits < len(binary):
        raise StringTooLongException

    if bits > 8:
        raise InvalidBitValueException

    index = 0
    for pixel in data:
        if index < len(binary):
            pixel = list(pixel)
            pixel[0] >>= bits
            pixel[0] <<= bits
            pixel[0] += int('0b' + binary[index:index+bits], 2)
            pixel = tuple(pixel)
            index += bits

        newData.append(pixel)

    image.putdata(newData)
    image.save(os.path.dirname(filename) + '/coded-'+os.path.basename(filename), 'PNG')

    return len(binary)

def unhide(filename, bits=2):
    image = Image.open(filename)
    data = image.getdata()

    if bits > 8:
        raise InvalidBitValueException

    binary = ''

    index = 0

    while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
        value = '00000000' + bin(data[index][0])[2:]
        binary += value[-bits:]
        index += 1

    message = bin2str(binary)
    return message

Now, the problem comes when I try to hide .pdf or .docx files in it. Several things are happening:

1) Microsoft Word or Adobe Acrobat shows that the file is corrupt.

2)The file size is considerable reduced from 40KB to 3KB which is a clear sign of error.

I think that the reason behind this could be that the file contains a NULL character reading which my program does not read further. Do you have any alternative idea for it?

I have an idea to change the ending byte but it may still have the same result as a file may contain that byte.

Thanks, again!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use and end-of-stream (EOS) marker when you are certain the marker sequence will not show up in your message stream. When you can't guarantee that, you have two options:

  • create a more complicated EOS marker, comprised of many bytes. This can be quite the nuisance to prove the same problem won't arise as before, or
  • Add a header at the beginning of your message, which encodes how many bits/bytes to read for the complete message extraction.

Generally, I'd use a header whenever I have information beforehand that I want to transmit and only rely on EOS markers when I don't know when my byte stream will terminate, e.g., on-the-fly compression.

For embedding, you should aim to:

  • get your binary string
  • measure its length
  • convert that integer to a binary of fixed size, say, 32 bits
  • attach that bitstring in front of your message bitstring
  • embed all of that to your cover medium

And for extraction:

  • extract the first 32 bits
  • convert those to an integer to get your message bitstring length
  • start from index 32 and extract the neccessary number of bits
  • convert back to a bytestream and save to a file

As a bonus, you can add all sorts of information to your header, e.g., the name of the original file. As long as it's all encoded in a way you can extract it later. For example.

header = 4 bytes for the length of the message string +
         1 byte for the number of characters in the filename +
         that many bytes for the filename

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...