I want to convert a pdf file to json format. So I was using PyPDF2 module to read the pdf. But I am unable to read it. I gives me some " " characters but no text. The pdf I am using can be retrieve from here: pdf_to_json.pdf
The code I am using is:
import PyPDF2 file = open("pdf_to_json.pdf", "rb") pdf = PyPDF2.PdfFileReader(file) page_one = pdf.getPage(0) page_one.extractText()
It's returning something like this:
' '
DISCLAIMER: The pdf is in spanish
1.4m articles
1.4m replys
5 comments
57.0k users