brawnystaff
Board Regular
- Joined
- Aug 9, 2012
- Messages
- 109
- Office Version
- 365
I have Excel 2016 and Acrobat XI Pro
I have a multi-page PDF with various non-sequential serial numbers (Bates numbers) at the bottom of each page. The syntax for each serial number is ABC-XXX with the XXX being three numbers. I am trying to extract all the serial numbers to a column in Excel. I have some VBA code below that can extract the first serial number to a Msgbox, but need it tweaked to return all serial numbers in all pages and return to a worksheet rather than a message box. Any ideas? Thanks
I have a multi-page PDF with various non-sequential serial numbers (Bates numbers) at the bottom of each page. The syntax for each serial number is ABC-XXX with the XXX being three numbers. I am trying to extract all the serial numbers to a column in Excel. I have some VBA code below that can extract the first serial number to a Msgbox, but need it tweaked to return all serial numbers in all pages and return to a worksheet rather than a message box. Any ideas? Thanks
Code:
'http://www.eileenslounge.com/viewtopic.php?f=30&t=5907
Sub Demo()
Dim strPDF As String, strTmp As String, i As Integer
' The next ten lines and the last line in this sub can help if
' you get "ActiveX component can't create object" errors even
' though a Reference to Acrobat is set in Tools|References.
Dim bTask As Boolean
bTask = True
'If Tasks.Exists(Name:="Adobe Acrobat Professional") = False Then
bTask = False
Dim AdobePath As String, WshShell As Object
Set WshShell = CreateObject("Wscript.shell")
AdobePath = WshShell.RegRead("HKEY_CLASSES_ROOT\acrobat\shell\open\command\")
AdobePath = Trim(Left(AdobePath, InStr(AdobePath, "/") - 1))
Shell AdobePath, vbHide
'End If
'Replace FilePath & Filename with the correct FilePath & Filename for the pdf file to be read.
strPDF = ReadAcrobatDocument("D:\Test\NPPES.pdf") '***Use your path.
searchstring = "ABC-"
'ActiveDocument.Range.InsertAfter strPDF
Debug.Print strPDF
searchbegin = InStr(1, strPDF, searchstring)
searchend = Mid(strPDF, searchbegin, 7)
MsgBox searchend
End Sub
Public Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
Last edited: