Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
188 views
in Technique[技术] by (71.8m points)

c# - Extract Image from a particular page in PDF

I want to extract an Image from a PDF file. I tried with the following code and it extracted a jpeg Image perfectly from the PDF. The problem is how to extract image from a particular page e.g. Page 1 or from some other page. I don't want to read the whole PDF to search for the Image.

Any suggestions?

Code to extract Image:

private void List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
        {
            List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

            iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
            iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
            iTextSharp.text.pdf.PdfObject PDFObj = null;
            iTextSharp.text.pdf.PdfStream PDFStremObj = null;

            try
            {
                RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
                PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

                for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
                {
                    PDFObj = PDFReaderObj.GetPdfObject(i);

                    if ((PDFObj != null) && PDFObj.IsStream())
                    {
                        PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                        iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                        if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                        {
                            byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                            if ((bytes != null))
                            {
                                try
                                {
                                    System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);

                                    MS.Position = 0;
                                    System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
                                    pictureBox1.Image = ImgPDF;
                                    MS.Close();
                                    MS.Flush();

                                }
                                catch (Exception)
                                {

                                }
                            }
                        }
                    }
                }
                PDFReaderObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }                
        }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't have iTextSharp 4.0 available currently so this code targets 5.2 but it should work just fine for the older one, too. This code is an almost direct lift from this post here, so please see that post as well as responses for further questions. As I said in the comments above, your code is looking at all of the images from the document-perspective while the code that I linked to goes page-by-page.

Please read all of the comments in the other post, especially this one which explains that this ONLY works for JPG images. There's a lot of different types of images that PDF supports so unless you know that you're only dealing with JPGs you'll need to add a bunch of more code. See this post and this post for some hints.

        string testFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Doc1.pdf");
        string outputPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
        int pageNum = 1;

        PdfReader pdf = new PdfReader(testFile);
        PdfDictionary pg = pdf.GetPageN(pageNum);
        PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
        PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
        if (xobj == null) { return; }
        foreach (PdfName name in xobj.Keys) {
            PdfObject obj = xobj.Get(name);
            if (!obj.IsIndirect()) { continue; }
            PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
            PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
            if (!type.Equals(PdfName.IMAGE)) { continue; }
            int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
            PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
            PdfStream pdfStrem = (PdfStream)pdfObj;
            byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
            if (bytes == null) { continue; }
            using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes)) {
                memStream.Position = 0;
                System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                if (!Directory.Exists(outputPath))
                    Directory.CreateDirectory(outputPath);

                string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", pageNum));
                System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
                var jpegEncoder = ImageCodecInfo.GetImageEncoders().ToList().Find(x => x.FormatID == ImageFormat.Jpeg.Guid);
                img.Save(path, jpegEncoder, parms);

            }
        }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...