OK, enough people are getting this wrong that I'm going to post some code I have to identify TIFFs:
private const int kTiffTagLength = 12;
private const int kHeaderSize = 2;
private const int kMinimumTiffSize = 8;
private const byte kIntelMark = 0x49;
private const byte kMotorolaMark = 0x4d;
private const ushort kTiffMagicNumber = 42;
private bool IsTiff(Stream stm)
{
stm.Seek(0);
if (stm.Length < kMinimumTiffSize)
return false;
byte[] header = new byte[kHeaderSize];
stm.Read(header, 0, header.Length);
if (header[0] != header[1] || (header[0] != kIntelMark && header[0] != kMotorolaMark))
return false;
bool isIntel = header[0] == kIntelMark;
ushort magicNumber = ReadShort(stm, isIntel);
if (magicNumber != kTiffMagicNumber)
return false;
return true;
}
private ushort ReadShort(Stream stm, bool isIntel)
{
byte[] b = new byte[2];
_stm.Read(b, 0, b.Length);
return ToShort(_isIntel, b[0], b[1]);
}
private static ushort ToShort(bool isIntel, byte b0, byte b1)
{
if (isIntel)
{
return (ushort)(((int)b1 << 8) | (int)b0);
}
else
{
return (ushort)(((int)b0 << 8) | (int)b1);
}
}
I hacked apart some much more general code to get this.
For PDF, I have code that looks like this:
public bool IsPdf(Stream stm)
{
stm.Seek(0, SeekOrigin.Begin);
PdfToken token;
while ((token = GetToken(stm)) != null)
{
if (token.TokenType == MLPdfTokenType.Comment)
{
if (token.Text.StartsWith("%PDF-1."))
return true;
}
if (stm.Position > 1024)
break;
}
return false;
}
Now, GetToken() is a call into a scanner that tokenizes a Stream into PDF tokens. This is non-trivial, so I'm not going to paste it here. I'm using the tokenizer instead of looking at substring to avoid a problem like this:
% the following is a PostScript file, NOT a PDF file
% you'll note that in our previous version, it started with %PDF-1.3,
% incorrectly marking it as a PDF
%
clippath stroke showpage
this code is marked as NOT a PDF by the above code snippet, whereas a more simplistic chunk of code will incorrectly mark it as a PDF.
I should also point out that the current ISO spec is devoid of the implementation notes that were in the previous Adobe-owned specification. Most importantly from the PDF Reference, version 1.6:
Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…