Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
420 views
in Technique[技术] by (71.8m points)

python - How to tell if a file is gzip compressed?

I have a Python program which is going to take text files as input. However, some of these files may be gzip compressed.

Is there a cross-platform, usable from Python way to determine if a file is gzip compressed or not?

Is the following reliable or could an ordinary text file 'accidentally' look gzip-like enough for me to get false positives?

try:
    gzip.GzipFile(filename, 'r')
    # compressed
    # ...
except:
    # not compressed
    # ...
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.

Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).

One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...