Search For String In PDF Using VBA

wsnyder

Board Regular
Joined
Sep 23, 2018
Messages
224
Office Version
  1. 365
Platform
  1. Windows
Hi all,



Using Adobe Acrobat Pro DC v 2020.012.20048

I am trying to search pdf files for a string and log the results so I can review the files that return True. The files are on a \\Server\Share\

I tested the code below on a local folder with 3 pdf files and everything worked as expected.
I then changed the path to the \\server\share\ . I did get results for 3 files, then I received an error:
Run-time error '-2147023170(800706be)':
Automation error
The remote procedure call failed
Here:
VBA Code:
blnSearch = AVDocObj.findtext(szText:=searchString, _
bCaseSensitive:=False, _
bWholeWordsOnly:=True, _
bReset:=2)

Any ideas on what else I can try, or do I have to download all files to local first (64K files)
thanks,
-w

complete code:
Code:
Sub Search_PDF_For_String()
'https://software-solutions-online.com/how-to-search-a-pdf-for-a-string-using-vba-for-excel/

'Objects
Dim wb As Workbook
Dim ws As Worksheet
Dim appObj As Object
Dim AVDocObj As Object
Dim fso As Object
Dim f As Object

'Variables
Dim searchString As String
Dim PDF_path As String
Dim blnSearch As Boolean
Dim r_output As Long

'Initialize objects
Set wb = ThisWorkbook
Set ws = wb.Worksheets("PDF_Search")
Set fso = CreateObject("Scripting.FileSystemObject")

'Excel environment - speed things up
With Application
.ScreenUpdating = False
.DisplayAlerts = False
.EnableEvents = False
.Calculation = xlCalculationManual
End With

'Initailize variables
r_output = 5

'Get folder path and search phrase
With ws
searchString = .Cells(2, 1).Value
PDF_path = .Cells(1, 1).Value
End With

'Search each file for the phrase
Set fsoFolder = fso.GetFolder(PDF_path)
For Each f In fsoFolder.Files
Set appObj = CreateObject("AcroExch.App")
Set AVDocObj = CreateObject("AcroExch.AVDoc")
'Open the PDF file and check if the open was successful.
If AVDocObj.Open(f.path, "") = True Then
AVDocObj.BringToFront
blnSearch = AVDocObj.findtext(szText:=searchString, _
bCaseSensitive:=False, _
bWholeWordsOnly:=True, _
bReset:=2)
AVDocObj.Close True
appObj.Exit

'Release the objects.
Set AVDocObj = Nothing
Set appObj = Nothing

'log results
With ws
.Cells(r_output, 1).Value = f.path
.Cells(r_output, 2).Value = f.Name
.Cells(r_output, 3).Value = blnSearch
End With
End If
r_output = r_output + 1
Next f

'Add headers
With ws
.Cells(4, 1).Value = "Path"
.Cells(4, 2).Value = "File"
.Cells(4, 3).Value = "Found_T_F"
End With


'Tidy up
'Destroy objects
Set ws = Nothing
Set wb = Nothing

'Excel environment - restore
With Application
.ScreenUpdating = True
.DisplayAlerts = True
.EnableEvents = True
.Calculation = xlCalculationAutomatic
End With

End Sub
 

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Some changes you can try:

1. Delete lines referring to appObj, because you don't use it anywhere.
2. Move Set AVDocObj = CreateObject("AcroExch.AVDoc") above the loop, as the object only needs to be created once, and move Set AVDocObj = Nothing below the loop.
3. Delete AVDocObj.BringToFront as the PDF doesn't need to be visible to search it.
 
Upvote 0
Thanks John,

I made the changes you suggested.
I also went into Edit >> Preferences >> Signatures >> Verification | and removed the Tick Mark: Verify signatures when the document is opened.

I now get 61 results of expected 50.6K.

I receive the same error message:
Run-time error '-2147023170(800706be)':
Automation error
The remote procedure call failed

Thanks,
-w
 
Upvote 0
I can't reproduce that error. Does the same error occur on the same PDF file if it is the only file in a local folder?

Your macro uses the "AcroExch.AVDoc" object and I got errors with its FindText method with some PDFs. The following macro uses "AcroExch.PDDoc" and "AcroExch.AvDoc" and seems to work better because no errors occurred. For the difference between AvDoc and PDDoc and how they work together see this page:


Other changes:
  1. Removed the 'speed up Excel' lines as these have a negligible effect in such a macro, where most of the processing is done by the Acrobat API. Leaving ScreenUpdating on allows you to see the worksheet being updated as each PDF is searched.
  2. Only process the PDF if its file name matches "*.pdf" (case-insensitive).
VBA Code:
Public Sub Search_PDFs_For_String()
    
    Dim ws As Worksheet
    Dim AcroPDDoc As Object
    Dim AcroAVDoc As Object
    Dim FSO As Object, FSOfolder As Object, FSOfile As Object
    Dim searchString As String
    Dim PDF_path As String
    Dim blnSearch As Boolean
    Dim lr As Long, r_output As Long
    
    Set ws = ThisWorkbook.Worksheets("PDF_Search")
    Set FSO = CreateObject("Scripting.FileSystemObject")
    Set AcroPDDoc = CreateObject("AcroExch.PDDoc")
    
    'Excel environment - speed things up
'    With Application
'        .ScreenUpdating = False
'        .DisplayAlerts = False
'        .EnableEvents = False
'        .Calculation = xlCalculationManual
'    End With
    
    'Get folder path and search phrase
    With ws
        PDF_path = .Range("A1").Value
        searchString = .Range("A2").Value
        .Range("A4:C4").Value = Split("Path,File,Found?", ",")
        lr = .Cells(.Rows.Count, "A").End(xlUp).Row
        If lr >= 5 Then .Range("A5:C5", .Cells(lr, "A")).ClearContents
        r_output = 5
    End With
    
    'Search each file for the phrase
    Set FSOfolder = FSO.GetFolder(PDF_path)
    
    For Each FSOfile In FSOfolder.Files
        
        If LCase(FSOfile.Name) Like "*.pdf" Then
        
            'Open the PDF file and check if the open was successful.
            
            If AcroPDDoc.Open(FSOfile.Path) Then
            
                Set AcroAVDoc = AcroPDDoc.OpenAVDoc("")               
                blnSearch = AcroAVDoc.FindText(szText:=searchString, _
                                              bCaseSensitive:=False, _
                                              bWholeWordsOnly:=True, _
                                              bReset:=2)
                AcroAVDoc.Close bNoSave:=True
            
                'log results
                With ws
                    .Cells(r_output, 1).Value = FSOfile.Path
                    .Cells(r_output, 2).Value = FSOfile.Name
                    .Cells(r_output, 3).Value = blnSearch
                End With
                r_output = r_output + 1
                DoEvents
                
            End If
        
        End If
        
    Next
   
    'Destroy objects
    Set AcroAVDoc = Nothing
    Set AcroPDDoc = Nothing
    
    'Excel environment - restore
'    With Application
'        .ScreenUpdating = True
'        .DisplayAlerts = True
'        .EnableEvents = True
'        .Calculation = xlCalculationAutomatic
'    End With

End Sub
 
Upvote 0
Thanks John,

I appreciate your good help!

I tested your revisions on a local folder with 40 pdf files.
Only the first file results are being logged to the worksheet.

I added debug.print "File: " & FSOfile.name inside the FSO For Next Loop
It is definitely looping through all files in the folder

Is the loop executing too fast for Acrobat to keep up?
No error messages.

thanks
-w
 
Upvote 0
Hi John,

I moved the debug statement :

Code:
If AcroPDDoc.Open(FSOfile.path) Then
            Debug.Print "File: " & FSOfile.Name

Now only the first filename is being printed to the Immediate Window
Any thoughts why Acrobat is not opening each file?
They are all pdf files in the folder.

thanks,
-w
 
Upvote 0
Sorry, I don't know why.

Maybe you need the Set appObj = CreateObject("AcroExch.App") in your original code after all, but put that line before the loop and appObj.Exit after the loop.
 
Upvote 0

Forum statistics

Threads
1,223,889
Messages
6,175,223
Members
452,620
Latest member
dsubash

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top