You can open the PDF file and extract its contents using the Adobe library (which I believe you can download from Adobe as part of the SDK, but it comes with certain versions of Acrobat as well)
Make sure to add the Library to your references too (On my machine it is the Adobe Acrobat 10.0 Type Library, but not sure if that is the newest version)
Even with the Adobe library it is not trivial (you'll need to add your own error-trapping etc):
Function getTextFromPDF(ByVal strFilename As String) As String
Dim objAVDoc As New AcroAVDoc
Dim objPDDoc As New AcroPDDoc
Dim objPage As AcroPDPage
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim pageNum As Long
Dim strText As String
strText = ""
If (objAvDoc.Open(strFilename, "") Then
Set objPDDoc = objAVDoc.GetPDDoc
For pageNum = 0 To objPDDoc.GetNumPages() - 1
Set objPage = objPDDoc.AcquirePage(pageNum)
Set objHighlight = New AcroHiliteList
objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
Set objSelection = objPage.CreatePageHilite(objHighlight)
If Not objSelection Is Nothing Then
For tCount = 0 To objSelection.GetNumText - 1
strText = strText & objSelection.GetText(tCount)
Next tCount
End If
Next pageNum
objAVDoc.Close 1
End If
getTextFromPDF = strText
End Function
What this does is essentially the same thing you are trying to do - only using Adobe's own library. It's going through the PDF one page at a time, highlighting all of the text on the page, then dropping it (one text element at a time) into a string.
Keep in mind what you get from this could be full of all kinds of non-printing characters (line feeds, newlines, etc) that could even end up in the middle of what look like contiguous blocks of text, so you may need additional code to clean it up before you can use it.
Hope that helps!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…