Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.9k views
in Technique[技术] by (71.8m points)

javascript - How to increase OCR accuracy in Node JS and Tesseract.js?

I use tesseract.js for detecting numbers in Node JS. For example this is my image :

enter image description here

I run my script and it detects something like this:

289 ,0

And due to noises in the image, it considers space, other signs like comma and etc.

Is there anyway I can specify just numbers and no others signs like space and commas?

Also this is my code:

tesseract.recognize(
    __dirname + '/Captcha.png',
    'eng',
    { logger: m => console.log(m) }
).then(({ data: { text } }) => {
    console.log(text);
});

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't no the js tesseract API, however it seems that there is a quite simple work-around here by filter afterward:

tesseract.recognize(
    __dirname + '/Captcha.png',
    'eng',
    { logger: m => console.log(m) }
).then(({ data: { text } }) => {
    const filteredText = Array.from(text.matchAll(/d/g)).join("")
    console.log(filteredText)
})

Here's the test for just the filtering function:

if (Array.from("209, 1".matchAll(/d/g)).join("") !== "2091") {
  throw("Not working")
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...