Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
542 views
in Technique[技术] by (71.8m points)

compiler construction - Undecompilable Python

It is possible to decompile .pyc files: Decompile Python 2.7 .pyc

Is it possible to `compile` python files so there is a human-unreadable code, like the c++ -> exe binary file? ..unlike the plaintext .py and very easily recoverable .pyc files? (I don't mind if it can be cracked by brute force)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Python is a highly dynamic language, and supports many different levels of introspection. Because of that, obfuscating Python bytecode is a mountainous task.

Moreover, your embedded python interpreter will still need to be able to execute the bytecode you ship with your product. And if the interpreter needs to be able to access the bytecode, then everyone else can too. Encryption won't help, because you still need to decrypt the bytecode yourself and then everyone else can read the bytecode from memory. Obfuscation only makes default tools harder, not impossible to use.

With that said, here is what you'd have to do to make it really bloody hard to read your application's Python bytecode:

  • Re-assign all python opcode values a new value. Rewire the whole interpreter to use different byte values for different opcodes.

  • Remove all as many introspection features as you can get away with. Your functions need to have closures, and codeobjects need constants still, but to hell with the locals list in the code object, for example. Neuter the sys._getframe() function, slash traceback information.

Both these steps require in-depth knowledge of how the Python interpreter works, and how the Python object model fits together. You will most certainly introduce bugs that will be hard to solve.

In the end, you have to ask yourself if this is worth it. A determined hacker can still analyze your bytecode, do a some frequency analysis to reconstruct the opcode table, and / or feed your program different opcodes to see what happens, and decipher all the obfuscation. Once a translation table is created, decompiling your bytecode is a snap, and reconstructing your code is not far away.

If all you want to do is prevent bytecode files from being altered, embed checksums for your .pyc files, and check those on startup. Refuse to load if they don't match. Someone will patch your binary to remove the checksum check or replace the checksums, but you won't have to put in nearly as much effort to provide at least some token protection from tampering.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...