The main problem is that the DeflateStream
class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.
To fix this it suffices to drop the two-byte ZLIB header.
Another problem became clear in your first example document: That document was encrypted, so before FLATE decoding the stream contents therein have to be decrypted.
###Drop ZLIB header to get to the FLATE encoded data
The DeflateStream
class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.
Fortunately it is pretty easy to jump to the FLATE encoded data therein, one simply has to drop the first two bytes. (Strictly speaking there might be a dictionary identifier between them and the FLATE encoded data but this appears to be seldom used.)
in case of your code:
var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);
originalFileStream.ReadByte();
originalFileStream.ReadByte();
using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
###In case of encrypted PDFs, decrypt first
Your first example file pdf-test.pdf is encrypted as is indicated by the presence of an Encrypt entry in the trailer:
trailer
<</Size 37/Encrypt 38 0 R>>
startxref
116
%%EOF
Before decompressing stream contents, therefore, you have to decrypt them.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…