Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
293 views
in Technique[技术] by (71.8m points)

python - Extract a page from a pdf as a jpeg

In python code, how to efficiently save a certain page in a pdf as a jpeg file? (Use case: I've a python flask web server where pdf-s will be uploaded and jpeg-s corresponding to each page is stores.)

This solution is close, but the problem is that it does not convert the entire page to jpeg.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The pdf2image library can be used.

You can install it simply using,

pip install pdf2image

Once installed you can use following code to get images.

from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)

Saving pages in jpeg format

for page in pages:
    page.save('out.jpg', 'JPEG')

Edit: the Github repo pdf2image also mentions that it uses pdftoppm and that it requires other installations:

pdftoppm is the piece of software that does the actual magic. It is distributed as part of a greater package called poppler. Windows users will have to install poppler for Windows. Mac users will have to install poppler for Mac. Linux users will have pdftoppm pre-installed with the distro (Tested on Ubuntu and Archlinux) if it's not, run sudo apt install poppler-utils.

You can install the latest version under Windows using anaconda by doing:

conda install -c conda-forge poppler

note: Windows versions upto 0.67 are available at http://blog.alivate.com.au/poppler-windows/ but note that 0.68 was released in Aug 2018 so you'll not be getting the latest features or bug fixes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...