Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
470 views
in Technique[技术] by (71.8m points)

c# - Highlight words in a pdf using itextsharp, not displaying highlighted word in browser

Highlighted words are not displaying in browser using itextsharp.

Adobe

enter image description here

Browser

enter image description here

CODE

 List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                    foreach (Rectangle rect in MatchesFound)
                    {
                        float[] quad = { rect.Left - 3.0f, rect.Bottom, rect.Right, rect.Bottom, rect.Left - 3.0f, rect.Top + 1.0f, rect.Right, rect.Top + 1.0f };
                        //Create our hightlight
                        PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
                        //Set the color
                        highlight.Color = BaseColor.YELLOW;
                       
                        //Add the annotation
                        stamper.AddAnnotation(highlight, pageno);
                        
                    }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First of all...

Why does the OP's (updated) code not work

There actually are two factors.

First of all, there is an issue in the OP's code, to add a rectangle to a path he uses

canvas.Rectangle(rect);

Unfortunately this does not what he expects: The Rectangle class has multiple properties beyond the mere coordinates of a rectangle, foremost information about selected borders, border colors, and an interior color, and PdfContentByte.Rectangle(Rectangle) draws a rectangle according to those properties.

In the case at hand, though, rect is used only to transport the coordinates of a rectangle, so those additional properties all are false or null. Thus, canvas.Rectangle(rect) does nothing!

Instead the OP should use

canvas.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);

here.

Furthermore, @Bruno mentioned in his answer

Note that you won't see the yellow rectangle if you add it under an opaque shape (e.g. under an image).

Unfortunately exactly this is the case here: The document actually is a scanned document, each page been a page-filling image under which the equivalent text is drawn (probably after OCR'ing) to allow textual copy&paste.

Thus, whatever the OP's code may draw on the UnderContent, it will be hidden by that very image.

Thus, let's try something different...

How to make it work

@Bruno in his answer also indicated a solution for such a case:

In that case, you could add a transparent rectangle on top of the existing content.

Following this advice we replace

canvas = stamper.GetUnderContent(pageno);

by

canvas = stamper.GetOverContent(pageno);

PdfGState state = new PdfGState();
state.FillOpacity = .3f;
canvas.SetGState(state);

Selecting the word "support" on the third document page we get:

using an opacity of .3

The yellow is quite pale here.

Using an Opacity value of .6 instead we get

using an opacity of .6

Now the yellow is more intense but the text starts to pale out.

For tasks like this I actually prefer using the blend mode Darken. This can be done by using

state.BlendMode = new PdfName("Darken");

instead of state.FillOpacity = .3f. This results in

using the blend mode Darken

This IMO looks better.

How the client did it

The OP commented

Client have given a pdf. In that, they highlighted text, the highlighted text is displayed in browser

The client's PDF actually uses annotations, just like the OP in his original code, but in contrast each of the client's annotations contains an appearance stream which the highlight annotations generated by iText don't.

Supplying an appearance is optional and PDF viewers indeed should generate an appearance if none is given. Obviously, though, there are numerous PDF viewers which rely on appearances the PDF brings along.

By the way, the appearances in the client's PDF actually use the blend mode Multiply. For underlying white and black colors, Darken and Multiply have the same result.

Making it work with annotations

In a comment the OP wondered

Please one more doubt, if the user wrongly highlighted then how to remove yellow color(or change yellow to white)? i changed yellow to white but it's not working. canvas.SetColorFill(BaseColor.WHITE);

Undoing a change to the page content generally is more difficult than undoing the addition of an annotation. Thus, let's make the OP's original code also work, i.e. adding an appearance stream to the highlight annotations.

As the OP reported in another comment, his first attempt to add an appearance stream failed:

PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
appearance.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height);
appearance.SetColorFill(BaseColor.WHITE);
appearance.Fill();
highlight.SetAppearance( PdfAnnotation.APPEARANCE_NORMAL, appearance );
stamper.AddAnnotation(highlight, pageno);

but it's not working.

The problems in his attempt are:

  • The origin of the appearance template is in the lower left corner of the annotation area, not of the page. To color the area in question, therefore, the rectangle must have its lower left at (0, 0).
  • Strictly speaking the color must be set before starting the path building.
  • A different color than white should be used for highlighting.
  • Transparency or an appropriate rendering mode should be used to allow the original, marked text to shine through.

Thus, the following code shows how to do it.

private void highlightPDFAnnotation(string outputFile, string highLightFile, int pageno, string[] splitText)
{
    PdfReader reader = new PdfReader(outputFile);
    iTextSharp.text.pdf.PdfContentByte canvas;
    using (FileStream fs = new FileStream(highLightFile, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (PdfStamper stamper = new PdfStamper(reader, fs))
        {
            myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy();
            strategy.UndercontentHorizontalScaling = 100;

            string currentText = PdfTextExtractor.GetTextFromPage(reader, pageno, strategy);
            for (int i = 0; i < splitText.Length; i++)
            {
                List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(splitText[i].Trim(), StringComparison.CurrentCultureIgnoreCase);
                foreach (Rectangle rect in MatchesFound)
                {
                    float[] quad = { rect.Left - 3.0f, rect.Bottom, rect.Right, rect.Bottom, rect.Left - 3.0f, rect.Top + 1.0f, rect.Right, rect.Top + 1.0f };
                    //Create our hightlight
                    PdfAnnotation highlight = PdfAnnotation.CreateMarkup(stamper.Writer, rect, null, PdfAnnotation.MARKUP_HIGHLIGHT, quad);
                    //Set the color
                    highlight.Color = BaseColor.YELLOW;

                    PdfAppearance appearance = PdfAppearance.CreateAppearance(stamper.Writer, rect.Width, rect.Height);
                    PdfGState state = new PdfGState();
                    state.BlendMode = new PdfName("Multiply");
                    appearance.SetGState(state);
                    appearance.Rectangle(0, 0, rect.Width, rect.Height);
                    appearance.SetColorFill(BaseColor.YELLOW);
                    appearance.Fill();

                    highlight.SetAppearance(PdfAnnotation.APPEARANCE_NORMAL, appearance);

                    //Add the annotation
                    stamper.AddAnnotation(highlight, pageno);
                }
            }
        }
    }
    reader.Close();
}

These annotation are displayed by Chrome, too, and as annotations they can easily be removed.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...