Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
760 views
in Technique[技术] by (71.8m points)

c# - Extracting Additional Metadata from a PDF using iTextSharp

I've seen the extraction of basic metadata (ie. author, title) using iTextSharp and it usually looks something like this:

var pdfReader = new PdfReader(pdfData);
var author = pdfReader.Info["author"]

However, in my case I'm after something a bit more exotic, the additional "advanced" metadata that the document may contain.

Pardon the paint highlights, but here is a screenshot from within Adobe Acrobat showing the data in question:

the data in question via adobe acrobat

In this case, it doesn't seem like this data is available through the Info dictionary. Using a different library (PDFKit by TallComponents) this data is exposed, but I'm wondering if there is any way get it using iItext

I'm currently playing with iText 4.1.6 due to licensing restrictions, but I wouldn't be opposed to buying the commercial license for 5.0.6 if that adds required functionality.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

not sure if it will get exactly what you need, but to get the XMP metadata try something like this:

PdfReader reader = new PdfReader(YOUR_PDF);
byte[] b = reader.Metadata;
if (b != null) {
  string xml = new UTF8Encoding().GetString(b);
}

notice you get back a XML string.

IIRC the code will work with 4.1.6.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...