When performing OCR, it is important to prepossess the image so that the desired foreground text is in black with the background in white. To do this, we can use OpenCV to Otsu's threshold the image and obtain a binary image. We then apply a slight Gaussian blur to smooth the image before throwing it into Pytesseract. We use --psm 6
config to treat the image as a single uniform block of text. See here for more configuration options.
Here's the preprocessed image and the result from Pytesseract
PRACTICE ACCOUNT
$9,047.26~ i
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:Program FilesTesseract-OCResseract.exe"
image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…