Search string in First Page Header of Word docs in Folder (sub folders)

Sleeplol

Board Regular
Joined
Apr 10, 2019
Messages
194
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
Hi,
I'm hoping for help modifying the below (or possibly scrap) script.

Routine:
Return all MS Word File Names from a folder (sub folders would be even better) that contain a string in the First Page Header.

For instance the user would like all files that pertain to "Pellet Tech". So, she adds the folder path to cell Worksheets("Data").Range("I1") and adds the string ("Pellet Tech") to Worksheets("Data").Range("J1")
The Front Header is the below which notes "Pellet Tech" in the Apply to:
1681841881180.png


This script only returns all the file names from a particular folder regardless of extension:
VBA Code:
Sub LoopThroughFiles()

Dim oFSO As Object
Dim oFolder As Object
Dim oFile As Object
Dim i As Integer

Set oFSO = CreateObject("Scripting.FileSystemObject")

Set oFolder = oFSO.getfolder(Worksheets("Data").Range("I1"))

For Each oFile In oFolder.Files

    Worksheets("Data").Cells(i + 1, 1) = oFile.Name

    i = i + 1

Next oFile

End Sub

Thanks for any help with this
 

Attachments

  • 1681841874953.png
    1681841874953.png
    12.5 KB · Views: 27
Had a bit more time to contemplate this. I really don't like the repetitively creating FileSystemObjects in the LoopThroughSubFolders sub. It's probably better to remove the oFSO declarations from both subs and declare it at the top of the code module (along with the WordApp declaration). The LoopThroughFiles2 sub remains unchanged (except for the removal of the oFSO declaration). The resulting change to the LoopThroughSubFolders sub is as follows. I'm off to the cabin for the wkend so I won't be able to follow this up until next week. Dave
Code:
Sub LoopThroughSubFolders(sFolder As String)
Dim oHF As Object, LastRow As Integer
Dim oSubFolder As Object, oFile As Object, TFolder As Object

Set oSubFolder = oFSO.GetFolder(sFolder)

'search through all files in the current subfolder
For Each oFile In oSubFolder.Files
If oFile.Name Like "*" & ".doc" & "*" Then
    On Error Resume Next
    WordApp.Documents.Open oFile.Path
    If Err.Number = 0 Then
        Set oHF = WordApp.ActiveDocument.Sections(1).Headers(1).Range
        If InStr(1, oHF.Text, CStr(Worksheets("Data").Range("J1")), vbTextCompare) > 0 Then
            LastRow = Sheets("Data").Range("A" & Sheets("Data").Rows.Count).End(xlUp).Row
            Worksheets("Data").Cells(LastRow + 1, 1) = oFile.Name
        End If
        WordApp.ActiveDocument.Close SaveChanges:=False
    Else
    On Error GoTo 0
    MsgBox "Corrupt File " & oFile.Name
    End If
End If
Next oFile

'recursively call the function to search through all subfolders of the current subfolder
For Each TFolder In oSubFolder.SubFolders
    LoopThroughSubFolders TFolder.Path
Next TFolder

If Err.Number <> 0 Then
On Error GoTo 0
MsgBox "Error 2"
End If
Set oSubFolder = Nothing
Set TFolder = Nothing
Set oHF = Nothing
Set oFile = Nothing
End Sub
 
Upvote 0
Solution

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
Hey Dave,
Hope your Cabin weekend was great!

The below is your full most recent code which gave me some ideas for speed and 'safety' which I posted in the second "VBA window".

VBA Code:
Option Explicit
Dim WordApp As Object
Dim oFSO As Object 'declare oFSO at the module level

Sub LoopThroughFilesDave()
'Search sht Data "I1" Folder(s)for Word doc headers for string in sht Data "J1"
Dim oHF As Object, oFile As Object, oFolder As Object, TFolder As Object
Dim LastRow As Integer

'create Word app
On Error Resume Next
Set WordApp = GetObject(, "Word.Application")
If Err.Number <> 0 Then
    Set WordApp = CreateObject("Word.Application")
End If
WordApp.Visible = False

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFolder = oFSO.GetFolder(Worksheets("Data").Range("I1"))

'search through all files in the root folder
For Each oFile In oFolder.Files
    If oFile.Name Like "*" & ".doc" & "*" Then
        On Error Resume Next
        WordApp.Documents.Open oFile.path
        If Err.Number = 0 Then
            Set oHF = WordApp.ActiveDocument.Sections(1).Headers(1).Range
            If InStr(1, oHF.Text, CStr(Worksheets("Data").Range("J1")), vbTextCompare) > 0 Then
                LastRow = Sheets("Data").Range("A" & Sheets("Data").Rows.Count).End(xlUp).Row
                Worksheets("Data").Cells(LastRow + 1, 1) = oFile.Name
            End If
            WordApp.ActiveDocument.Close SaveChanges:=False
        Else
        On Error GoTo 0
        MsgBox "Corrupt File " & oFile.Name
        End If
    End If
Next oFile

'recursively search through all subfolders
For Each TFolder In oFSO.GetFolder(oFolder).SubFolders
    LoopThroughSubFolders TFolder.path
Next TFolder

If Err.Number <> 0 Then
On Error GoTo 0
MsgBox "Error 1"
End If
WordApp.Quit
Set WordApp = Nothing
Set TFolder = Nothing
Set oHF = Nothing
Set oFile = Nothing
Set oFolder = Nothing
Set oFSO = Nothing
End Sub

Sub LoopThroughSubFolders(sFolder As String)
Dim oHF As Object, oFile As Object, TFolder As Object
Dim oSubFolder As Object, LastRow As Integer

Set oSubFolder = oFSO.GetFolder(sFolder)

'search through all files in the current subfolder
For Each oFile In oSubFolder.Files
If oFile.Name Like "*" & ".doc" & "*" Then
    On Error Resume Next
    WordApp.Documents.Open oFile.path
    If Err.Number = 0 Then
        Set oHF = WordApp.ActiveDocument.Sections(1).Headers(1).Range
        If InStr(1, oHF.Text, CStr(Worksheets("Data").Range("J1")), vbTextCompare) > 0 Then
            LastRow = Sheets("Data").Range("A" & Sheets("Data").Rows.Count).End(xlUp).Row
            Worksheets("Data").Cells(LastRow + 1, 1) = oFile.Name
        End If
        WordApp.ActiveDocument.Close SaveChanges:=False
    Else
    On Error GoTo 0
    MsgBox "Corrupt File " & oFile.Name
    End If
End If
Next oFile

'recursively call the function to search through all subfolders of the current subfolder
For Each TFolder In oSubFolder.SubFolders
    LoopThroughSubFolders TFolder.path
Next TFolder

If Err.Number <> 0 Then
On Error GoTo 0
MsgBox "Error 2"
End If
Set oSubFolder = Nothing
Set TFolder = Nothing
Set oHF = Nothing
Set oFile = Nothing
End Sub




Basically added:
1) Open Word docs as 'Read Only'
2) Incorporate Recursive in Main procedure
3) Switched out Integer for Long

2 & 3 seem to help with speed. Sample folder has 60 Word Docs; this procedure takes your code from 1min 15sec [ish] down to 30-40 sec

VBA Code:
Option Explicit
Dim WordApp As Object
Dim oFSO As Object 'declare oFSO at the module level

Sub LoopThroughFilesSpeedy()
'Search sht Data "I1" Folder(s)for Word doc headers for string in sht Data "J1"
Dim oHF As Object, oFile As Object, oFolder As Object, TFolder As Object
Dim LastRow As Long

'create Word app
On Error Resume Next
Set WordApp = GetObject(, "Word.Application")
If Err.Number <> 0 Then
    Set WordApp = CreateObject("Word.Application")
End If
WordApp.Visible = False

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFolder = oFSO.GetFolder(Worksheets("Data").Range("I1"))

'search through all files in the root folder
For Each oFile In oFolder.Files
    If oFile.Name Like "*" & ".doc" & "*" Then
        On Error Resume Next
        WordApp.Documents.Open oFile.path, ReadOnly:=True
        If Err.Number = 0 Then
            Set oHF = WordApp.ActiveDocument.Sections(1).Headers(1).Range
            If InStr(1, oHF.Text, CStr(Worksheets("Data").Range("J1")), vbTextCompare) > 0 Then
                LastRow = Sheets("Data").Range("A" & Sheets("Data").Rows.Count).End(xlUp).Row
                Worksheets("Data").Cells(LastRow + 1, 1) = oFile.Name
            End If
            WordApp.ActiveDocument.Close SaveChanges:=False
        Else
            On Error GoTo 0
            MsgBox "Corrupt File " & oFile.Name
        End If
    End If
Next oFile

'recursively search through all subfolders
For Each TFolder In oFSO.GetFolder(oFolder).SubFolders
    LoopThroughSubFolders TFolder.path
Next TFolder

If Err.Number <> 0 Then
    On Error GoTo 0
    MsgBox "Error 1"
End If
WordApp.Quit
Set WordApp = Nothing
Set TFolder = Nothing
Set oHF = Nothing
Set oFile = Nothing
Set oFolder = Nothing
Set oFSO = Nothing
End Sub

Sub LoopThroughSubFolders(sFolder As String)
Dim oHF As Object, oFile As Object, TFolder As Object
Dim oSubFolder As Object, LastRow As Long

Set oSubFolder = oFSO.GetFolder(sFolder)

'search through all files in the current subfolder
For Each oFile In oSubFolder.Files
    If oFile.Name Like "*" & ".doc" & "*" Then
        On Error Resume Next
        WordApp.Documents.Open oFile.path, ReadOnly:=True
        If Err.Number = 0 Then
            Set oHF = WordApp.ActiveDocument.Sections(1).Headers(1).Range
            If InStr(1, oHF.Text, CStr(Worksheets("Data").Range("J1")), vbTextCompare) > 0 Then
                LastRow = Sheets("Data").Range("A" & Sheets("Data").Rows.Count).End(xlUp).Row
                Worksheets("Data").Cells(LastRow + 1, 1) = oFile.Name
            End If
            WordApp.ActiveDocument.Close SaveChanges:=False
        Else
            On Error GoTo 0
            MsgBox "Corrupt File " & oFile.Name
    End If
End If
Next oFile

'recursively call the function to search through all subfolders of the current subfolder
For Each TFolder In oSubFolder.SubFolders
    LoopThroughSubFolders TFolder.path
Next TFolder

If Err.Number <> 0 Then
On Error GoTo 0
MsgBox "Error 2"
End If
Set oSubFolder = Nothing
Set TFolder = Nothing
Set oHF = Nothing
Set oFile = Nothing
End Sub

Really appreciate you giving me some code I could tinker with and ultimately use in my application.
Thanks Dave.
Hopefully you and/or other cane use this in future projects.

Also, if you have any other breakthroughs, fell free to post. I am always updating my toolbox.
 
Upvote 0
Also, I really need to figure out why so many of these Word docs are being noted as Corrupt. This in on our end and has nothing to do with the code you supplied
 
Upvote 0
You are welcome. Thanks for posting your outcome. 60 docs in 30-40 secs is pretty good... 2 docs a sec is great considering the recursion. I'm not certain how your additions sped things up? I can't really see any change to the recursion call from the main procedure? The integer vs long shouldn't matter unless you have thousands of rows. No idea if read only is any quicker or safer. Maybe you have saved the corrupt docs either under error or with more than 1 instance of Word open. Also, I thought afterwards that maybe using MsgBox "Corrupt File " & oFile.FullName might be more useful. Anyways, thanks for posting your outcome. Have a nice day. Dave
edit: Turning off your screen updating at the start of the main procedure and turning it back on at the end may speed things up.
 
Last edited:
Upvote 0
The speed difference seems to mainly be from "Read Only" vs Open for Edit.
I do like your use of the MsgBox, because I like knowing what's what. I just wanted to see how fast this could run. I'll likely keep the messages, depending on what the rest of the team prefers.
Have a nice rest of the day as well. I'm off to the dentist with my son lol
 
Upvote 0

Forum statistics

Threads
1,223,884
Messages
6,175,171
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top