Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
499 views
in Technique[技术] by (71.8m points)

asp.net - Highlighting text ( colors ) of existing PDF using iTextsharp using C#

I would like know whether we can highlight text (colors) of already created PDF using itextsharp?

I see examples like creating a new PDF, while doing so we can apply colors. I am looking for where I can get chunks of text from PDF and apply colors and save it.

Here is the thing I am trying to accomplish, read a PDF file, parse text and highlight text based on business rules.

Any third party dll suggestion also works, as a first step I am looking in to opensource iTextsharp library.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Yes you can highlight text but you will have to work for it unfortunately. What looks like a highlight is a PDF Text Markup Annotation as far as the spec is considered. That part is pretty easy. The hard part is figuring out the coordinates to apply the annotation to.

Here's the simple code for creating a highlight using an existing PdfStamper called stamper:

PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);

Once you have the highlight you can set the color using:

highlight.Color = BaseColor.YELLOW;

And then add it to your stamper on page 1 using:

stamper.AddAnnotation(highlight,1);

Technically the rect parameter doesn't actually get used (as far as I can tell) and instead gets overridden by the quad parameter. The quad parameter is an array of x,y coords that essentially represent the corners of a rectangle (technically quadrilateral). The spec says they start in the bottom left and go counter-clockwise but in reality they appear to go bottom left to bottom right to top left to top right. Calculating the quad is a pain so instead its just easier to create a rectangle and create the quad from it:

iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };

So how do you get the rectangle of existing text in the first place? For that you need to look at TextExtractionStrategy and PdfTextExtractor. There's a lot to go into so I'm going to start by pointing you at this post which has some further posts linked.

Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.2 that shows off the creation of a simple PDF and the highlighting of part of the text using hard-coded coordinates. If you need help calculating these coordinates start with the link above and then ask any questions!

using System;
using System.ComponentModel;
using System.Data;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //Create a simple test file
            string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");

            using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document(PageSize.LETTER))
                {
                    using (PdfWriter w = PdfWriter.GetInstance(doc, fs))
                    {
                        doc.Open();
                        doc.Add(new Paragraph("This is a test"));
                        doc.Close();
                    }
                }
            }

            //Create a new file from our test file with highlighting
            string highLightFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Highlighted.pdf");

            //Bind a reader and stamper to our test PDF
            PdfReader reader = new PdfReader(outputFile);

            using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (PdfStamper stamper = new PdfStamper(reader, fs))
                {
                    //Create a rectangle for the highlight. NOTE: Technically this isn't used but it helps with the quadpoint calculation
                    iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(60.6755f, 749.172f, 94.0195f, 735.3f);
                    //Create an array of quad points based on that rectangle. NOTE: The order below doesn't appear to match the actual spec but is what Acrobat produces
                    float[] quad = { rect.Left, rect.Bottom, rect.Right, rect.Bottom, rect.Left, rect.Top, rect.Right, rect.Top };

                    //Create our hightlight
                    PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);

                    //Set the color
                    highlight.Color = BaseColor.YELLOW;

                    //Add the annotation
                    stamper.AddAnnotation(highlight,1);
                }
            }

            this.Close();
        }
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...