Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
337 views
in Technique[技术] by (71.8m points)

python - Find text position in PDF file

I have a PDF file and I am trying to find a specific text in the PDF and highlight it using Python. I found PyPDF2, which can highlight part of a PDF when we give the coordinates of the wanted highlight position in the file.

I am trying to find a tool which can give me the position of a given text in the PDF.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

PyMuPDF can find text by coordinates. You can use this in conjunction with the PyPDF2 highlighting method to accomplish what you're describing. Or you can just use PyMuPDF to highlight the text.

Here is sample code for finding text and highlighting with PyMuPDF:

import fitz

### READ IN PDF
doc = fitz.open("input.pdf")

for page in doc:
    ### SEARCH
    text = "Sample text"
    text_instances = page.searchFor(text)

    ### HIGHLIGHT
    for inst in text_instances:
        highlight = page.addHighlightAnnot(inst)
        highlight.update()


### OUTPUT
doc.save("output.pdf", garbage=4, deflate=True, clean=True)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...