VBA - Extract data from Acrobat PDF using Excel

brawnystaff

Board Regular
Joined
Aug 9, 2012
Messages
109
Office Version
  1. 365
I have Excel 2016 and Acrobat XI Pro

I have a multi-page PDF with various non-sequential serial numbers (Bates numbers) at the bottom of each page. The syntax for each serial number is ABC-XXX with the XXX being three numbers. I am trying to extract all the serial numbers to a column in Excel. I have some VBA code below that can extract the first serial number to a Msgbox, but need it tweaked to return all serial numbers in all pages and return to a worksheet rather than a message box. Any ideas? Thanks

Code:
'http://www.eileenslounge.com/viewtopic.php?f=30&t=5907
Sub Demo()

  Dim strPDF As String, strTmp As String, i As Integer
  ' The next ten lines and the last line in this sub can help if
  ' you get "ActiveX component can't create object" errors even
  ' though a Reference to Acrobat is set in Tools|References.
  Dim bTask As Boolean
    bTask = True
  'If Tasks.Exists(Name:="Adobe Acrobat Professional") = False Then
    bTask = False
    Dim AdobePath As String, WshShell As Object
    Set WshShell = CreateObject("Wscript.shell")
    AdobePath = WshShell.RegRead("HKEY_CLASSES_ROOT\acrobat\shell\open\command\")
    AdobePath = Trim(Left(AdobePath, InStr(AdobePath, "/") - 1))
    Shell AdobePath, vbHide
  'End If
  'Replace FilePath & Filename with the correct FilePath & Filename for the pdf file to be read.
  
  strPDF = ReadAcrobatDocument("D:\Test\NPPES.pdf") '***Use your path.
  
searchstring = "ABC-"
  
  'ActiveDocument.Range.InsertAfter strPDF
  Debug.Print strPDF
  searchbegin = InStr(1, strPDF, searchstring)
  searchend = Mid(strPDF, searchbegin, 7)
  MsgBox searchend
End Sub




Public Function ReadAcrobatDocument(strFileName As String) As String
  'Note: A Reference to the Adobe Library must be set in Tools|References!
  Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
  Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
  Dim PageNumber, PageContent, Content, i, j
  Set AcroApp = CreateObject("AcroExch.App")
  Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
  If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
  ' The following While-Wend loop shouldn't be necessary but timing issues may occur.
  While AcroAVDoc Is Nothing
    Set AcroAVDoc = AcroApp.GetActiveDoc
  Wend
  Set AcroPDDoc = AcroAVDoc.GetPDDoc
  For i = 0 To AcroPDDoc.GetNumPages - 1
    Set PageNumber = AcroPDDoc.AcquirePage(i)
    Set PageContent = CreateObject("AcroExch.HiliteList")
    If PageContent.Add(0, 9000) <> True Then Exit Function
    Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
    ' The next line is needed to avoid errors with protected PDFs that can't be read
    On Error Resume Next
    For j = 0 To AcroTextSelect.GetNumText - 1
      Content = Content & AcroTextSelect.GetText(j)
    Next j
  Next i
  ReadAcrobatDocument = Content
  AcroAVDoc.Close True
  AcroApp.Exit
  Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
 
Last edited:

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.

Forum statistics

Threads
1,223,911
Messages
6,175,323
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top