Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
300 views
in Technique[技术] by (71.8m points)

Uipath - How to extract A table from a pdf

Hi i have found some video and text on how to do this but they dont help with this task. I know how to get one values but not extract a table.

I want this to get exported into a database if possible or a Excel. But i cant figure it out. I have even tryed change the "Change reading opption"

I tryed to "data scraping" but the program just say "This controler does not support data extraction" And it can't be more of a table then this.

enter image description here

I have heard that it cant be because the structure of the PDF is bad. Still isn't there more ways of doing this.

question from:https://stackoverflow.com/questions/65832528/uipath-how-to-extract-a-table-from-a-pdf

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Unfortunately, there is no activity in UiPath to read tables directly from PDFs. (As of today.) That was the bad news. The good news is that you can get to the contents of the PDF. Either you get the data (as flat text) directly with UiPath.PDF.Activities.ReadPDFText or you have to use OCR. @kwoxer provided a wonderful link for explanations on this topic. I have already been able to extract data from tables contained in a PDF document. At that time, I was lucky: ReadPDFText extracted everything. The table elements were separated by tabs (""). And the table header contained a word that did not appear elsewhere in the document.

Just as an idea, I proceeded like this:

  1. Extract text from the PDF document with UiPath.PDF.Activities.ReadPDFText.
  2. Create an array, where the elements are the lines in the document. (Split using Environment.NewLine and option StringSplitOptions.RemoveEmptyEntries)
  3. Go through lines in a loop (ForEach) until the table header is found. (StartsWith or Contains etc.)
  4. The next row belongs to the table as long as it contains a tab. (Otherwise the table is over.)
  5. Split current row by tab and store it in an array: The elements of the array are the individual cells of the row.

I hope, this idea help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...