Uipath - How to extract A table from a pdf

Question

Welcome To Ask or Share your Answers For Others

Uipath - How to extract A table from a pdf

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Uipath - How to extract A table from a pdf

Hi i have found some video and text on how to do this but they dont help with this task. I know how to get one values but not extract a table.

I want this to get exported into a database if possible or a Excel. But i cant figure it out. I have even tryed change the "Change reading opption"

I tryed to "data scraping" but the program just say "This controler does not support data extraction" And it can't be more of a table then this.

I have heard that it cant be because the structure of the PDF is bad. Still isn't there more ways of doing this.

question from:https://stackoverflow.com/questions/65832528/uipath-how-to-extract-a-table-from-a-pdf

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:45+0000

Unfortunately, there is no activity in UiPath to read tables directly from PDFs. (As of today.) That was the bad news. The good news is that you can get to the contents of the PDF. Either you get the data (as flat text) directly with UiPath.PDF.Activities.ReadPDFText or you have to use OCR. @kwoxer provided a wonderful link for explanations on this topic. I have already been able to extract data from tables contained in a PDF document. At that time, I was lucky: ReadPDFText extracted everything. The table elements were separated by tabs (""). And the table header contained a word that did not appear elsewhere in the document.

Just as an idea, I proceeded like this:

Extract text from the PDF document with UiPath.PDF.Activities.ReadPDFText.
Create an array, where the elements are the lines in the document. (Split using Environment.NewLine and option StringSplitOptions.RemoveEmptyEntries)
Go through lines in a loop (ForEach) until the table header is found. (StartsWith or Contains etc.)
The next row belongs to the table as long as it contains a tab. (Otherwise the table is over.)
Split current row by tab and store it in an array: The elements of the array are the individual cells of the row.

I hope, this idea help.

Categories

Uipath - How to extract A table from a pdf

Uipath - How to extract A table from a pdf

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags