VBA to count occurrences of the keywords in each file in a folder and sub folders

JakeP

New Member
Joined
Apr 23, 2023
Messages
14
Office Version
  1. 2019
Platform
  1. Windows
Hello,
I have searched this here and have found some solutions that were quite close but would be very helpful and SUPER appreciative to achieve the following.
I have a folder with many .txt files
I am looking for VBA code to
Prompt for a file location,
Prompt for keyword(s)
Scan each file in the file location and sub folders
Return the count (number of hits) of the keyword(s) or full strings in each file along with the associated info for each as Excel as shown below.

The ultimate goal is to determine how many an error occurs in each file, time of day to eventually create a Pareto chart of incidents to help precisely target efforts for troubleshooting/reducing these errors.

Hopefully I have explained this well. :)

Many, Many thanks in advance!!!

1690129427774.png
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
How are these text files formatted?

Can you use XL2BB to show us some examples of the data?
 
Upvote 0
Hello Herakles, thanks for the quick reply.
Unfortunately, at this time, I do not have any files nor can I get them as they are securely stored and can only be accessed locally.
Might be able to get some samples tomorrow.
However, they are typically tab delimited.

I am thinking would it be possible to scan each row in each file in each sub folder to accomplish this?
I think if there are multiple occurrences in a single row, returning either ALL occurrences or just the single occurrence would be ok.
Would awesome if it could be optional prior to running, but I if too much trouble, that's ok.
Trying to figure out which files have the highest hit rates for the keywords.

Thanks again, JP
 
Upvote 0
Give this a go.

Paste the code into a main code module.

Execute the subMain procedure.

It does not check the following but just in each entire file.
I think if there are multiple occurrences in a single row, returning either ALL occurrences or just the single occurrence would be ok.
Would awesome if it could be optional prior to running, but I if too much trouble, that's ok.

If checking each line would be beneficial just let me know.

Test it on a copy of your files.

Can you please state the version of Excel that you are using in your profile.

VBA Code:
Dim dictExtensions As Scripting.Dictionary
Dim WsFiles As Worksheet

Private Sub subMain()
Dim diaFolderPicker As FileDialog
Dim strFolder As String
Dim strKeywords  As String
Dim s As String
Dim arr() As Variant
Dim intCount As Integer
Dim intOccurrences As Integer
    
    On Error GoTo Err_Handler
    
    ActiveWorkbook.Save
    
    On Error Resume Next
    Application.DisplayAlerts = False
    Worksheets("Files").Delete
    On Error GoTo 0
    Application.DisplayAlerts = True
    Worksheets.Add after:=Sheets(Sheets.Count)
    ActiveSheet.Name = "Files"
    Set WsFiles = Worksheets("Files")
    WsFiles.Cells.ClearContents
        
    WsFiles.Range("A1").Resize(1, 9).Value = Array("Location", "Name", "File Created Date", _
        "File Created Time", "Size", "Type", "Keyword Number", "Keyword", "Keyword Hits")
      
    Set dictExtensions = New Scripting.Dictionary
    dictExtensions.Add key:="txt", Item:=dictExtensions.Count + 1

      ' Prompt for folder.
    Set diaFolderPicker = Application.FileDialog(msoFileDialogFolderPicker)

    With diaFolderPicker
        .Title = "Select A Base Folder"
        .AllowMultiSelect = False
        If .Show <> -1 Then
            Exit Sub
        End If
        strFolder = .SelectedItems(1)
    End With
    
    ' Prompt for keywords.
    strKeywords = InputBox("Enter keywords seperated by a comma.")

    If strKeywords = "" Then
        Exit Sub
    End If

    Call subLoopThroughFiles(strFolder, strKeywords)
            
    WsFiles.Range("C:C").NumberFormat = "dd/mm/yyyy"
    WsFiles.Range("D:D").NumberFormat = "hh:mm:ss"
    WsFiles.Range("E:E").NumberFormat = "0"
    
    With WsFiles.Range("A1").CurrentRegion
        
        .Font.Size = 16
        .RowHeight = 30
        .VerticalAlignment = xlCenter
        .HorizontalAlignment = xlLeft
        .IndentLevel = 1
        
        With .Rows(1)
            .Interior.Color = RGB(213, 213, 213)
            .Font.Bold = True
        End With
        
        With .Borders
            .LineStyle = xlContinuous
            .Weight = xlThin
            .ColorIndex = vbBlack
        End With
        
        .Cells.EntireColumn.AutoFit
        
    End With
                
    With WsFiles
        With .Range("A2", .Cells(Rows.Count, "B").End(xlUp))
            arr = .Worksheet.Evaluate("UNIQUE(" & .Address & ")")
            intCount = UBound(arr)
        End With
        intOccurrences = WorksheetFunction.Sum(.Range("I2", .Cells(Rows.Count, "I").End(xlUp)))
    End With
    
    MsgBox intCount & " files contain " & intOccurrences & " occurrences of the keywords " & vbCrLf & vbCrLf & _
        strKeywords & ".", vbOKCancel, "Confirmation"
    
Exit_Handler:

    Exit Sub

Err_Handler:
        
    MsgBox "There has been an error. " & vbCrLf & vbCrLf & Err.Number & "  " & Err.Description
    
    Resume Exit_Handler
    
End Sub

Private Function fncCount_String(ByVal strSearch As String, _
    ByVal strMatch As String, _
    Optional blnCaseSensitive As Boolean = False, _
    Optional blnAddWhiteSpace As Boolean = False) As Long
    
    Dim i As Long
    
    If blnAddWhiteSpace And Len(strMatch) > 1 Then
        For i = Len(strMatch) - 1 To 1 Step -1
            strMatch = Left$(strMatch, i) & "\s*" & Mid$(strMatch, i + 1)
        Next i
    End If
    
    With CreateObject("vbscript.regexp")
        .ignorecase = Not blnCaseSensitive
        .Global = True
        .Pattern = strMatch
        fncCount_String = .Execute(strSearch).Count
    End With
    
End Function

Private Sub subLoopThroughFiles(strFolder As String, strKeywords As String)
Dim objFSO As FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objFile As File
Dim objFolder As folder
Dim objSubFolder As folder
Dim i As Integer
Dim a As Integer
Dim arrKeywords() As String
Dim intCount As Integer
Dim lngCountThisFile As Long
Dim strFileText As String

On Error GoTo Err_Handler

    If InStr(1, strKeywords, ",", vbTextCompare) = 0 Then
        strKeywords = strKeywords & ","
    End If
    
    arrKeywords = Split(strKeywords, ",")
    
    Set objFolder = objFSO.GetFolder(strFolder)
    
    For Each objSubFolder In objFolder.subfolders
       
        Call subLoopThroughFiles(strFolder & "\" & objSubFolder.Name, strKeywords)
        
        For Each objFile In objSubFolder.Files
        
            With objFile
            
                If dictExtensions.Exists(objFSO.GetExtensionName(.Path)) Then
                                    
                    For i = LBound(arrKeywords) To UBound(arrKeywords)
                        
                        strFileText = CreateObject("scripting.filesystemobject").OpenTextFile(.Path).ReadAll

                        intCount = fncCount_String(strFileText, arrKeywords(i), False, True)
              
                        If intCount > 0 Then
                            WsFiles.Range("A" & Rows.Count).End(xlUp).Offset(1).Resize(1, 9).Value = Array(.ParentFolder, .Name, _
                                Format(.DateCreated, "dd/mm/yyyy"), Format(.DateCreated, "hh:mm:ss"), .Size, .Type, i + 1, arrKeywords(i), intCount)
                        End If
                        
                    Next i  '
 
                End If   ' Extension in the list.
                 
            End With    ' objFile
                        
        Next objFile
        
    Next objSubFolder

Exit_Handler:

    Exit Sub

Err_Handler:
        
    MsgBox "There has been an error. " & vbCrLf & vbCrLf & Err.Number & "  " & Err.Description
    
    Resume Exit_Handler
    
End Sub
 
Upvote 0
Give this a go.

Paste the code into a main code module.

Execute the subMain procedure.

It does not check the following but just in each entire file.
I think if there are multiple occurrences in a single row, returning either ALL occurrences or just the single occurrence would be ok.
Would awesome if it could be optional prior to running, but I if too much trouble, that's ok.

If checking each line would be beneficial just let me know.

Test it on a copy of your files.

Can you please state the version of Excel that you are using in your profile.

VBA Code:
Dim dictExtensions As Scripting.Dictionary
Dim WsFiles As Worksheet

Private Sub subMain()
Dim diaFolderPicker As FileDialog
Dim strFolder As String
Dim strKeywords  As String
Dim s As String
Dim arr() As Variant
Dim intCount As Integer
Dim intOccurrences As Integer
   
    On Error GoTo Err_Handler
   
    ActiveWorkbook.Save
   
    On Error Resume Next
    Application.DisplayAlerts = False
    Worksheets("Files").Delete
    On Error GoTo 0
    Application.DisplayAlerts = True
    Worksheets.Add after:=Sheets(Sheets.Count)
    ActiveSheet.Name = "Files"
    Set WsFiles = Worksheets("Files")
    WsFiles.Cells.ClearContents
       
    WsFiles.Range("A1").Resize(1, 9).Value = Array("Location", "Name", "File Created Date", _
        "File Created Time", "Size", "Type", "Keyword Number", "Keyword", "Keyword Hits")
     
    Set dictExtensions = New Scripting.Dictionary
    dictExtensions.Add key:="txt", Item:=dictExtensions.Count + 1

      ' Prompt for folder.
    Set diaFolderPicker = Application.FileDialog(msoFileDialogFolderPicker)

    With diaFolderPicker
        .Title = "Select A Base Folder"
        .AllowMultiSelect = False
        If .Show <> -1 Then
            Exit Sub
        End If
        strFolder = .SelectedItems(1)
    End With
   
    ' Prompt for keywords.
    strKeywords = InputBox("Enter keywords seperated by a comma.")

    If strKeywords = "" Then
        Exit Sub
    End If

    Call subLoopThroughFiles(strFolder, strKeywords)
           
    WsFiles.Range("C:C").NumberFormat = "dd/mm/yyyy"
    WsFiles.Range("D:D").NumberFormat = "hh:mm:ss"
    WsFiles.Range("E:E").NumberFormat = "0"
   
    With WsFiles.Range("A1").CurrentRegion
       
        .Font.Size = 16
        .RowHeight = 30
        .VerticalAlignment = xlCenter
        .HorizontalAlignment = xlLeft
        .IndentLevel = 1
       
        With .Rows(1)
            .Interior.Color = RGB(213, 213, 213)
            .Font.Bold = True
        End With
       
        With .Borders
            .LineStyle = xlContinuous
            .Weight = xlThin
            .ColorIndex = vbBlack
        End With
       
        .Cells.EntireColumn.AutoFit
       
    End With
               
    With WsFiles
        With .Range("A2", .Cells(Rows.Count, "B").End(xlUp))
            arr = .Worksheet.Evaluate("UNIQUE(" & .Address & ")")
            intCount = UBound(arr)
        End With
        intOccurrences = WorksheetFunction.Sum(.Range("I2", .Cells(Rows.Count, "I").End(xlUp)))
    End With
   
    MsgBox intCount & " files contain " & intOccurrences & " occurrences of the keywords " & vbCrLf & vbCrLf & _
        strKeywords & ".", vbOKCancel, "Confirmation"
   
Exit_Handler:

    Exit Sub

Err_Handler:
       
    MsgBox "There has been an error. " & vbCrLf & vbCrLf & Err.Number & "  " & Err.Description
   
    Resume Exit_Handler
   
End Sub

Private Function fncCount_String(ByVal strSearch As String, _
    ByVal strMatch As String, _
    Optional blnCaseSensitive As Boolean = False, _
    Optional blnAddWhiteSpace As Boolean = False) As Long
   
    Dim i As Long
   
    If blnAddWhiteSpace And Len(strMatch) > 1 Then
        For i = Len(strMatch) - 1 To 1 Step -1
            strMatch = Left$(strMatch, i) & "\s*" & Mid$(strMatch, i + 1)
        Next i
    End If
   
    With CreateObject("vbscript.regexp")
        .ignorecase = Not blnCaseSensitive
        .Global = True
        .Pattern = strMatch
        fncCount_String = .Execute(strSearch).Count
    End With
   
End Function

Private Sub subLoopThroughFiles(strFolder As String, strKeywords As String)
Dim objFSO As FileSystemObject
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objFile As File
Dim objFolder As folder
Dim objSubFolder As folder
Dim i As Integer
Dim a As Integer
Dim arrKeywords() As String
Dim intCount As Integer
Dim lngCountThisFile As Long
Dim strFileText As String

On Error GoTo Err_Handler

    If InStr(1, strKeywords, ",", vbTextCompare) = 0 Then
        strKeywords = strKeywords & ","
    End If
   
    arrKeywords = Split(strKeywords, ",")
   
    Set objFolder = objFSO.GetFolder(strFolder)
   
    For Each objSubFolder In objFolder.subfolders
      
        Call subLoopThroughFiles(strFolder & "\" & objSubFolder.Name, strKeywords)
       
        For Each objFile In objSubFolder.Files
       
            With objFile
           
                If dictExtensions.Exists(objFSO.GetExtensionName(.Path)) Then
                                   
                    For i = LBound(arrKeywords) To UBound(arrKeywords)
                       
                        strFileText = CreateObject("scripting.filesystemobject").OpenTextFile(.Path).ReadAll

                        intCount = fncCount_String(strFileText, arrKeywords(i), False, True)
             
                        If intCount > 0 Then
                            WsFiles.Range("A" & Rows.Count).End(xlUp).Offset(1).Resize(1, 9).Value = Array(.ParentFolder, .Name, _
                                Format(.DateCreated, "dd/mm/yyyy"), Format(.DateCreated, "hh:mm:ss"), .Size, .Type, i + 1, arrKeywords(i), intCount)
                        End If
                       
                    Next i  '
 
                End If   ' Extension in the list.
                
            End With    ' objFile
                       
        Next objFile
       
    Next objSubFolder

Exit_Handler:

    Exit Sub

Err_Handler:
       
    MsgBox "There has been an error. " & vbCrLf & vbCrLf & Err.Number & "  " & Err.Description
   
    Resume Exit_Handler
   
End Sub
You will need to make reference to the Microsoft Scripting Runtime library.
 
Upvote 0
Hello Herakles,
Apologies for the delayed feedback.
I did try your code, however, I was not able to get it to function.
I realize that I was only able to provide you with limited details and not really fair to ask you for further assistance considering it turn out to be more complex than I expected.
I was able to find a peer employee (programmer) to provide a solution and respectfully didn't want to waste any more of time.

I sincerely appreciate your time and effort that you did spend on this.

KR, JP
 
Upvote 0

Forum statistics

Threads
1,224,505
Messages
6,179,153
Members
452,891
Latest member
JUSTOUTOFMYREACH

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top