Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

debugging - How do you debug PDF files?

Many times I create a PDF either programmatically and there might be a problem with it, e.g. some specific letter might no show up well or I might have encoding issues etc.

Is there some way to debug a PDF? E.g. see it's detailed structure?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are a number of free tools that'll let you look at the guts of a PDF, uncompressed and decrypted (given the password).

RUPS for iText springs to mind (but I'm biased). I don't know that there's an iTextSharp equivalent. It's a GUI with a tree view (something ALL these apps have) of the PDF objects.

Some will let you edit the PDF within that tree, but not many. I believe Windjack's PDF CanOpener will (along with several other spiffy features you'd expect from a commercial Acrobat plugin).

And in a pinch, <insert favorite text editor here> works... but don't try to change anything. PDF is a binary format: byte offsets are important. If your text editor changes the to a (or tries to interpret it as UTF-8, or, or, or), your PDF will be Horribly Broken. Don't do that.

I end up doing a lot of searching for a given object number to look up indirect references. It's always a pain to look up a single digit reference because "4 obj" shows up at the end of every tenth object (14, 24, 34, 1234, etc). A regex search that looked for "beginning of line-4 obj-end of line" would be great, but I generally use notepad, so that's out (and I'm not much of a regex guy anyway).

PS: Even with a spiffy Acrobat plugin(not can opener, home grown from way back), I still need to crack open a text editor from time to time.

Acrobat will make changes at times as it loads a PDF (mostly to fix things), and if you want to know What's Really There, you need to look at that PDF in some other way. And when you're trying to debug a broken PDF, acrobat being helpful is the last thing you need.

PPS: Acrobat also has a spiffy "pdf syntax check" in its advanced->preflight profiles. It's also got checks for various PDF/* standards (PDF/X, PDF/A-1 [a and b], etc), accessibility, and so forth. They're invaluable when you're trying to Be Compliant. Not quite the debugging tool you were asking about, but Very Handy none the less.

PPPS: "diff"ing two PDFs is all but impossible, without writing a custom tool to do it for you. I wrote something that listed all the pages (with sizes) and fields (with types, flags, etc) in a predictable order and dumped it to a text file so I could diff the files... but directly diffing two PDFs is pointless. There are too many ways for "identical" files to differ (object order, dictionary key order, compression levels, etc).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...