Unfortunately, there is no activity in UiPath to read tables directly from PDFs. (As of today.) That was the bad news. The good news is that you can get to the contents of the PDF. Either you get the data (as flat text) directly with UiPath.PDF.Activities.ReadPDFText or you have to use OCR.
@kwoxer provided a wonderful link for explanations on this topic.
I have already been able to extract data from tables contained in a PDF document. At that time, I was lucky: ReadPDFText extracted everything. The table elements were separated by tabs (""). And the table header contained a word that did not appear elsewhere in the document.
Just as an idea, I proceeded like this:
- Extract text from the PDF document with UiPath.PDF.Activities.ReadPDFText.
- Create an array, where the elements are the lines in the document. (Split using Environment.NewLine and option StringSplitOptions.RemoveEmptyEntries)
- Go through lines in a loop (ForEach) until the table header is found. (StartsWith or Contains etc.)
- The next row belongs to the table as long as it contains a tab. (Otherwise the table is over.)
- Split current row by tab and store it in an array: The elements of the array are the individual cells of the row.
I hope, this idea help.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…