Search closed word document and excel file without opening and check for match

bradyboyy88

Well-known Member
Joined
Feb 25, 2015
Messages
562
Hi,

I am wondering if there is a way to use the find keyword function in a word document and excel file without actually opening it using vba?
 
This is my test folder structure

Main Folder ---The Cat Man.docx & Subfolder
Subfolder --- The Cat Man 2.docx

To debug I tested out the File.Name at each part of the loop and the first iteration the File.Name returns the value The Cat Man.docx. Then the second iteration which is what is triggering the error shows ~$e Cat Man.docx. So this must be the error but its hard to tell what could be causing this problem. The word file has nothing wrong with it as I just created a new one for testing too and still getting the same problem.
 
Upvote 0

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
Thanks for that. I figured the problem truly isnt that line. It has to do with File.Name so possibly FileSystemObject. Its a really strange occurance actually. As it cycles through files it seems to include files that are no longer in the folder and those files names become ~$ Then whatever the name is minums the first 2 letters since those two take their places. So I have put a if then statement to not try to open any files with Left(File.Name,2)="~$" and then everything runs fine. It is just really wierd to see that happening.
 
Upvote 0
Hello bradyboyy88,

I did a search online for error 5792. Here is link to the Microsoft Support Article You receive error messages when you start Word

This may help you identify the problem. Let me know what you discover.

With that error fixed I am able to loop through hundreds of word document files but then I always seem to hit a point where my for each file i loop starts to draw a mismatch error at
Rich (BB code):
 Set WordDoc = WordApp.Documents.Open(Filename:=File.Path, ReadOnly:=True)


So i set up error handling to say okay fine then skip that file and move on to the next file. But every single one starts to draw this mismatch error after one does it. Its strange because say I were to run the macro then 90 files would open great and the next 400 were mismatch errors. But I then run it again then 100 might be matches and 390 were mismatches. Could it be a memory leak? I do declare my objects outside of the routine as public so I do not have to keep re declaring them since this program is recursive. I can post my updated code if it helps.
 
Last edited:
Upvote 0
I put in a line number 10 for the one subroutine and that is where my error handler erl function returns. I have spent so much time trying to figure it out I would have been better going through all 8000 files by hand haha.

Here we go:

Code:
Option Explicit
Public Word As Variant
Public WordsToFind As New Collection
Public WordsFound As New Collection
Public PrintRow As Integer
Public SubFolderCount As Integer
Public FileCount As Integer
Public FileSystem As Object
Public FolderName As String
Public WordApp As Object
Public WordDoc As Object
Public strDate As String
Public ErrorCount As Integer


Sub Search()
    
    Dim HostFolder As String
    Dim WordsFound(1 To 7)
    Dim SearchWord As Integer
    
    'Clear any collection in memory
    Set WordsToFind = Nothing
    Set FileSystem = Nothing
    Set WordApp = Nothing
    Set WordDoc = Nothing
    
    'Create Word object
    Set WordApp = CreateObject("Word.Application")
    WordApp.Visible = False
    WordApp.DisplayAlerts = False
    
    'In case rerun clear public variable
    FileCount = 0
    SubFolderCount = 0
    ErrorCount = 0
    
    'Clear Current Results Page
    Worksheets("Results").Rows("2:" & Application.Max(Worksheets("Results").UsedRange.Rows.Count, 2)).Delete
    Worksheets("Results").OLEObjects.Delete
    
    'Setting the outputrow for printing results
    PrintRow = 2
    
    'Take sheet values used in program and assign to variables
    With Worksheets("Search Interface")
        
        HostFolder = .Range("D2").Value
        
        'Location of Search Words on Sheet . Code is also dynamic so just increase range if you want to add words
        For SearchWord = 8 To 12
            
            If Not IsEmpty(.Range("C" & SearchWord)) Then
                
                WordsToFind.Add .Range("C" & SearchWord)
            
            End If
            
        Next
        
    End With
    
    'If export spot is not empty then create folder which will then be zipped
    If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
        
        With Worksheets("Search Interface").Range("D4")
        
            FolderName = "SearchFile-" & Format(Now(), "mmddyyyyhhmmss")
        
            strDate = Format(Now, " dd-mmm-yy h-mm-ss")
            FileNameZip = .Value & "\" & "SearchFileZip " & strDate & ".zip"
        
            'Make new Folder for zip
            MkDir .Value & "\" & FolderName
            
        End With
    End If
    
    
    'Run recursive folder search algorithm
    Set FileSystem = CreateObject("Scripting.FileSystemObject")
    RecursiveFolderSearch FileSystem.GetFolder(HostFolder)
  
    'If export spot is not empty then create folder which will then be zipped then delete original
    If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
          
        With Worksheets("Search Interface").Range("D4")
            
            Zip_All_Files_in_Folder .Value & "\" & FolderName, .Value
            FileSystem.deletefolder .Value & "\" & FolderName
            
        End With
    End If
    
    'Clear Memory leaks
    WordApp.Quit
    Set WordsToFind = Nothing
    Set FileSystem = Nothing
    Set WordApp = Nothing
    Set WordDoc = Nothing
    
End Sub


Sub RecursiveFolderSearch(Folder)
    
    Dim SubFolder As Object
    Dim File As Object
    Dim PathName As String
    
    For Each SubFolder In Folder.SubFolders
          
        'This if then will help avoid shortcuts
        If Folder.Attributes <> 64 Then
            SubFolderCount = SubFolderCount + 1
            RecursiveFolderSearch SubFolder
        End If
    Next
    
    For Each File In Folder.Files
    
    'This is to get around MS stupid 255 character string limit
    If Len(File.Path) > 255 Then
    
        PathName = GetShortFileName(File.Path)
    
    Else
    
        PathName = File.Path
    
    End If
    
    'Just some summary statistics about how many folders and files were actually reviewed in total
    With Worksheets("Results")
        .Range("E1").Value = "Subfolder Count: " & SubFolderCount
        .Range("F1").Value = "Files Reviewed: " & FileCount
        .Range("G1").Value = "Match Count: " & PrintRow - ErrorCount - 2
        .Range("H1").Value = "Error Count: " & ErrorCount
    End With
    
    On Error GoTo ErrorHandler
        
        'Not sure what is causing this error. Maybe old files saved in memory?
        If Not Left(File.Name, 2) = "~$" Then
        
        FileCount = FileCount + 1
        Set WordsFound = Nothing
        
        Select Case LCase(Right(File.Name, 4))
            
            Case Is = "docx", ".doc", "docm", "dotm"  'Check if word document
                
10              Set WordDoc = WordApp.Documents.Open(Filename:=PathName, ReadOnly:=True)
                WordDoc.ActiveWindow.View.ReadingLayout = False
                                       
                'Select all and Check if word exists
                WordApp.Selection.WholeStory
                WordApp.Selection.Find.ClearFormatting
                    
                For Each Word In WordsToFind


                    With WordApp.Selection.Find
                        
                        .Text = Word
                        .Forward = True
                        .Wrap = 1
                        .Format = False
                        .MatchCase = False
                        .MatchWholeWord = True
                        .MatchWildcards = False
                        .MatchSoundsLike = False
                        .MatchAllWordForms = False
                        
                        If .Execute Then
                            
                            WordsFound.Add Word
                            
                        End If
                        
                    End With
                    
                Next
                
                If WordsFound.Count > 0 Then
                    
                'outputting the results
                    With Worksheets("Results")
                    
                        .Range("A" & PrintRow).Value = File.Name
                        
                        'This will merge the words found to one cell
                        For Each Word In WordsFound
                            
                            .Range("B" & PrintRow).Value = Word & "/" & .Range("B" & PrintRow).Value
                            
                        Next
                    
                        .Hyperlinks.Add Anchor:=.Range("C" & PrintRow), Address:=PathName, TextToDisplay:="Link - Click Here"
                        
                        'Check if interface wants you to embed file
                        If Worksheets("Search Interface").Shapes("EmbedCheckBox").ControlFormat.Value = 1 Then
            
                            .OLEObjects.Add Filename:=PathName, Link:=False, DisplayAsIcon:=True, Left:=.Range("D" & PrintRow).Left + 30, Top:=.Range("D" & PrintRow).Top + 3, Width:=50, Height:=10
            
                        End If
                        
                        'Send to export folder if directory isnt blank
                        If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
                            
                            FileSystem.CopyFile PathName, Worksheets("Search Interface").Range("D4").Value & "\" & FolderName & "\" & "Row Result - " & (PrintRow - 1) & ".docx", True
                                                        
                        End If
                        
                    End With
                    
                PrintRow = PrintRow + 1
                    
                End If
                
                WordDoc.Close SaveChanges:=False
                Set WordDoc = Nothing
                
        End Select
        
        End If
        
ContinueLoop:


    Next
    
Exit Sub


ErrorHandler:


        With Worksheets("Results")
                        
            .Range("A" & PrintRow).Value = File.Name
            
            If IsEmpty(.Range("B" & PrintRow).Value) Then
                .Range("B" & PrintRow).Value = "File triggered error -" & Err.Number & " " & Err.Description & " - VBA Line:" & Erl
                ErrorCount = ErrorCount + 1 'I want to only count errors if the matching process was not able to go through
            End If
            
            .Hyperlinks.Add Anchor:=.Range("C" & PrintRow), Address:=PathName, TextToDisplay:="Link - Click Here"
            
        End With
        
        PrintRow = PrintRow + 1
        
        If Not WordDoc Is Nothing Then
            WordDoc.Close SaveChanges:=False
        End If
        
        Set WordDoc = Nothing
        
Resume ContinueLoop


End Sub
 
Upvote 0
Those files that begin with tilde "~" are temporary files created by Word while editing a file. They will be created and destroyed while your program is running. You can't open them and it will cause an error. You need to validate that all the files you are opening are actually word files before you open them.

Looks like you have an error handler that should identify all the problems you run across.
 
Last edited:
Upvote 0
Yea hackslash the if statement of excluding trying to open those seemed to fix that 5792 error but now I am on to a stranger error type 13 mismatch error that only happens after hundreds of records are opened and closed and just continues into the future for the next 1000 records. Its also not consistent because the amount of successful opens changes every run despite the files are staying the same.

Say i run one of the subfolders by itself as the hostfolder then it will complete those 100 files fine. Now say I do the parent folder that has hundreds of other files too. Well when it gets to that one folder I just ran it might hit that mismatch error for those files unlike before since its not running 1000s of files. It makes no sense since the process is remaining the same.


here is better commenting of the code:

I put in a line number 10 for the one subroutine and that is where my error handler erl function returns. I have spent so much time trying to figure it out I would have been better going through all 8000 files by hand haha.

Here we go:

Code:
Option Explicit
Public Word As Variant
Public WordsToFind As New Collection
Public WordsFound As New Collection
Public PrintRow As Integer
Public SubFolderCount As Integer
Public FileCount As Integer
Public FileSystem As Object
Public FolderName As String
Public WordApp As Object
Public WordDoc As Object
Public strDate As String
Public ErrorCount As Integer


Sub Search()
    
    Dim HostFolder As String
    Dim WordsFound(1 To 7)
    Dim SearchWord As Integer
    
    'Clear any collection in memory
    Set WordsToFind = Nothing
    Set FileSystem = Nothing
    Set WordApp = Nothing
    Set WordDoc = Nothing
    
    'Create a new Word object and isntance
    Set WordApp = CreateObject("Word.Application")
    WordApp.Visible = False
    WordApp.DisplayAlerts = False
    
    'In case rerun clear public variable
    FileCount = 0
    SubFolderCount = 0
    ErrorCount = 0
    
    'Clear Current Results Page
    Worksheets("Results").Rows("2:" & Application.Max(Worksheets("Results").UsedRange.Rows.Count, 2)).Delete
    Worksheets("Results").OLEObjects.Delete
    
    'Setting the outputrow for printing results
    PrintRow = 2
    
    'Take sheet values used in program and assign to variables
    With Worksheets("Search Interface")
        
        HostFolder = .Range("D2").Value
        
        'Location of Search Words on Sheet . Code is also dynamic so just increase range if you want to add words
        For SearchWord = 8 To 12
            
            If Not IsEmpty(.Range("C" & SearchWord)) Then
                
                WordsToFind.Add .Range("C" & SearchWord)
            
            End If
            
        Next
        
    End With
    
    'If export spot directory on sheet is not empty then create folder names/variables
    If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
        
        With Worksheets("Search Interface").Range("D4")
        
            FolderName = "SearchFile-" & Format(Now(), "mmddyyyyhhmmss")
        
            strDate = Format(Now, " dd-mmm-yy h-mm-ss")
            FileNameZip = .Value & "\" & "SearchFileZip " & strDate & ".zip" ' This variable is used in another subroutine made by ron bruin to zip files and is in another module
        
            'Make new Folder for zip
            MkDir .Value & "\" & FolderName
            
        End With
    End If
    
    
    'Run recursive folder search algorithm
    Set FileSystem = CreateObject("Scripting.FileSystemObject")
    RecursiveFolderSearch FileSystem.GetFolder(HostFolder)
  
    'If export spot as stated above is not empty then create the folder which will then be zipped then delete original
    If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
          
        With Worksheets("Search Interface").Range("D4")
            
            Zip_All_Files_in_Folder .Value & "\" & FolderName, .Value
            FileSystem.deletefolder .Value & "\" & FolderName
            
        End With
    End If
    
    'Clear Memory leaks just in case
    WordApp.Quit
    Set WordsToFind = Nothing
    Set FileSystem = Nothing
    Set WordApp = Nothing
    Set WordDoc = Nothing
    
End Sub


Sub RecursiveFolderSearch(Folder)
    
    Dim SubFolder As Object
    Dim File As Object
    Dim PathName As String
    
    For Each SubFolder In Folder.SubFolders
          
        'This if then will help avoid shortcuts based on my research
        If Folder.Attributes <> 64 Then
            SubFolderCount = SubFolderCount + 1
            RecursiveFolderSearch SubFolder
        End If
    Next
    
    For Each File In Folder.Files
    
    'This is to get around MS stupid 255 character string limit
    If Len(File.Path) > 255 Then
    
        PathName = GetShortFileName(File.Path)
    
    Else
    
        PathName = File.Path
    
    End If
    
    'Just some summary statistics about how many folders and files were actually reviewed in total
    With Worksheets("Results")
        .Range("E1").Value = "Subfolder Count: " & SubFolderCount
        .Range("F1").Value = "Files Reviewed: " & FileCount
        .Range("G1").Value = "Match Count: " & PrintRow - ErrorCount - 2
        .Range("H1").Value = "Error Count: " & ErrorCount
    End With
    
    On Error GoTo ErrorHandler
        
        'Not sure what is causing this error. Maybe old files saved in memory because I dont see them in folder and they were deleted a while back? Either way need to not try and open files with these beginning two letters
        If Not Left(File.Name, 2) = "~$" Then
        
        FileCount = FileCount + 1

        'just to clear memory of this variable before assigning new one by opening worddoc
        Set WordsFound = Nothing
        
        Select Case LCase(Right(File.Name, 4))
            
            Case Is = "docx", ".doc", "docm", "dotm"  'Check if word document
                
'this line keeps tripping error mismatch around 200 files in and continues to do it. I say this line because erl returns 10 and all things accomplished seem to point to this.
10              Set WordDoc = WordApp.Documents.Open(Filename:=PathName, ReadOnly:=True)
                WordDoc.ActiveWindow.View.ReadingLayout = False
                                       
                'Select all and Check if word exists
                WordApp.Selection.WholeStory
                WordApp.Selection.Find.ClearFormatting
                  
               'loops through all possible search words in the keywords to search for and tries to find them in the word document
                For Each Word In WordsToFind

                    With WordApp.Selection.Find
                        
                        .Text = Word
                        .Forward = True
                        .Wrap = 1
                        .Format = False
                        .MatchCase = False
                        .MatchWholeWord = True
                        .MatchWildcards = False
                        .MatchSoundsLike = False
                        .MatchAllWordForms = False
                        

                        If .Execute Then

                         'Adds the word to the collection of words found which will later be printed on sheet  
                            WordsFound.Add Word
                            
                        End If
                        
                    End With
                    
                Next
                
                'if we have atleast one keyword found in the word document then it prints out stuff to excel sheet and transfers to zip folder if that directory was set in the sheet
                If WordsFound.Count > 0 Then
                    
                'outputting the results
                    With Worksheets("Results")
                    
                        .Range("A" & PrintRow).Value = File.Name
                        
                        'This will merge the words found to one cell
                        For Each Word In WordsFound
                            
                            .Range("B" & PrintRow).Value = Word & "/" & .Range("B" & PrintRow).Value
                            
                        Next
                    
                        .Hyperlinks.Add Anchor:=.Range("C" & PrintRow), Address:=PathName, TextToDisplay:="Link - Click Here"
                        
                        'Check if interface wants you to embed file.. Problem is this doesnt seem to allow saving because excel file will always say corrupt and repair but never works so never gets to save
                        If Worksheets("Search Interface").Shapes("EmbedCheckBox").ControlFormat.Value = 1 Then
            
                            .OLEObjects.Add Filename:=PathName, Link:=False, DisplayAsIcon:=True, Left:=.Range("D" & PrintRow).Left + 30, Top:=.Range("D" & PrintRow).Top + 3, Width:=50, Height:=10
            
                        End If
                        
                        'Send to export folder if directory isnt blank
                        If Not IsEmpty(Worksheets("Search Interface").Range("D4").Value) Then
                            
                            FileSystem.CopyFile PathName, Worksheets("Search Interface").Range("D4").Value & "\" & FolderName & "\" & "Row Result - " & (PrintRow - 1) & ".docx", True
                                                        
                        End If
                        
                    End With
                    
                PrintRow = PrintRow + 1
                    
                End If
                
                WordDoc.Close SaveChanges:=False
                Set WordDoc = Nothing
                
        End Select
        
        End If
        
ContinueLoop:


    Next
    
Exit Sub


ErrorHandler:

        'since there was an error it prints out the info to tthe excel sheet what file name was and error info where the keyword stuff should have gone . If keyword stuff was filled out and error tripped after that then it wont delete that info and should keep keywords found instead.  

            With Worksheets("Results")
                        
            .Range("A" & PrintRow).Value = File.Name
            
            If IsEmpty(.Range("B" & PrintRow).Value) Then
                .Range("B" & PrintRow).Value = "File triggered error -" & Err.Number & " " & Err.Description & " - VBA Line:" & Erl
                ErrorCount = ErrorCount + 1 'I want to only count errors if the matching process was not able to go through
            End If
            
            .Hyperlinks.Add Anchor:=.Range("C" & PrintRow), Address:=PathName, TextToDisplay:="Link - Click Here"
            
        End With
        
        PrintRow = PrintRow + 1
        
        If Not WordDoc Is Nothing Then
            WordDoc.Close SaveChanges:=False
        End If
        
        Set WordDoc = Nothing

'GO to the next file like nothing ever happened!        
Resume ContinueLoop


End Sub
 
Last edited:
Upvote 0
You might be hitting a limitation of VBA error trapping. You might want to limit the error trap to that one line. Then you can deal with other bugs separately. This will ensure that you only are trapping word open errors with this error trap. You should handle other errors in another way.

Code:
On Error GoTo ErrorHandler
10              Set WordDoc = WordApp.Documents.Open(Filename:=PathName, ReadOnly:=True)
On Error GoTo 0

That clears the errorhandler

You should also add this to the end of your error handler, before it returns:

Code:
err.clear
 
Upvote 0
Thats a good point. Now rerunning puts me at mismatch error and points me to

Code:
.text=word

When I hover over the yellow highlighted debug line of word it shows its value as "Test" which was one of my keywords which is a string so I dont see why that would be a mismatch.
 
Upvote 0

Forum statistics

Threads
1,225,699
Messages
6,186,523
Members
453,362
Latest member
zermrodrigues

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top