Count number of occurrences of all words across sheet

Chris78

New Member
Joined
Aug 11, 2017
Messages
17
Office Version
  1. 365
Platform
  1. Windows
Hi Folks,
I'm pretty new to VBA but confident this is the right place to find some help...
I'm trying extract the list of all distinct words from a worksheet, followed by the number of occurrences the same word is found in the complete source sheet.
Only words having more than 3 characters should be considered and punctuation ignored.

Si basically, this is an example of the source and expected result
Source:
| A |
1|Humpty Dumpty sat on a wall
2|Humpty had a great fall
3|Humpty Dumpty again

Expected Result:
| A | B |
1|HUMPTY | 3 |
2|DUMPTY | 2 |
3|WALL | 1 |
4|GREAT | 1 |
5|FALL | 1 |
6|AGAIN | 1 |

I got it that far my macro generates the list of words but am struggling to:
a - list *distinct* words
b - count the number of occurrences

Actual VBA code

Sub MakeWordList()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long
Dim txt As String
Dim wordCnt As Long
Dim MinLen As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable

Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1") = "All Words"
WordListSheet.Range("A1").Font.Bold = True
InputSheet.Activate
wordCnt = 2
MinLen = 3
PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
"$", "%", "&", "(", ")", " - ", "_", "--", "+", _
"=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
r = 1


' Loop until blank cell is encountered
Do While Cells(r, 1) <> ""
' covert to UPPERCASE
txt = UCase(Cells(r, 1))
' Remove punctuation
For i = 0 To UBound(PuncChars)
txt = Replace(txt, PuncChars(i), "")
Next i
' Remove excess spaces
txt = WorksheetFunction.Trim(txt)
' Extract the words
x = Split(txt)
For i = 0 To UBound(x)
If Len(x(i)) > MinLen Then
WordListSheet.Cells(wordCnt, 1) = x(i)
'WordListSheet.Cells(wordCnt, 2) = ?here would come the occurrence-count?
wordCnt = wordCnt + 1

End If
Next i
r = r + 1
Loop
End Sub
 

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
Give this macro a try...
Code:
[table="width: 500"]
[tr]
	[td]Sub WordCounts()
  Dim R As Long, X As Long, Data As Variant, PuncChars As Variant, Words() As String, Arr() As String
  PuncChars = Array(".", ",", ";", ":", "'", "!", "#", "$", "%", "&", "(", ")", " - ", "_", "--", "+", "=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
  Data = Range("A1", Cells(Rows.Count, "A").End(xlUp))
  With CreateObject("Scripting.Dictionary")
    For R = 1 To UBound(Data, 1)
      For X = LBound(PuncChars) To UBound(PuncChars)
        Data(R, 1) = Replace(UCase(Data(R, 1)), PuncChars(X), " ")
      Next
      Words = Split(Application.Trim(Data(R, 1)))
      For X = 0 To UBound(Words)
        If Len(Words(X)) > 3 Then .Item(Words(X)) = .Item(Words(X)) + 1
      Next
    Next
    Arr = Split(Join(.Keys))
    Range("C1").Resize(UBound(Arr) + 1) = Application.Transpose(Arr)
    Arr = Split(Join(.Items))
    Range("D1").Resize(UBound(Arr) + 1) = Application.Transpose(Arr)
  End With
End Sub[/td]
[/tr]
[/table]
 
Upvote 0
Give this macro a try...
Code:
[TABLE="width: 500"]
<tbody>[TR]
[TD]Sub WordCounts()
  Dim R As Long, X As Long, Data As Variant, PuncChars As Variant, Words() As String, Arr() As String
  PuncChars = Array(".", ",", ";", ":", "'", "!", "#", "$", "%", "&", "(", ")", " - ", "_", "--", "+", "=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
  Data = Range("A1", Cells(Rows.Count, "A").End(xlUp))
  With CreateObject("Scripting.Dictionary")
    For R = 1 To UBound(Data, 1)
      For X = LBound(PuncChars) To UBound(PuncChars)
        Data(R, 1) = Replace(UCase(Data(R, 1)), PuncChars(X), " ")
      Next
      Words = Split(Application.Trim(Data(R, 1)))
      For X = 0 To UBound(Words)
        If Len(Words(X)) > 3 Then .Item(Words(X)) = .Item(Words(X)) + 1
      Next
    Next
    Arr = Split(Join(.Keys))
    Range("C1").Resize(UBound(Arr) + 1) = Application.Transpose(Arr)
    Arr = Split(Join(.Items))
    Range("D1").Resize(UBound(Arr) + 1) = Application.Transpose(Arr)
  End With
End Sub[/TD]
[/TR]
</tbody>[/TABLE]
Thanks a billion - looks like I need more practice to understand all of it but works like a charm :-)
 
Upvote 0
Hi Rick

Maybe you could write both .Keys and .Items in 1 go:

Code:
    Range("C1").Resize(.Count, 2).Value = Application.Transpose(Array(.Keys, .Items))
 
Upvote 0
Hi Rick

Maybe you could write both .Keys and .Items in 1 go:

Code:
    Range("C1").Resize(.Count, 2).Value = Application.Transpose(Array(.Keys, .Items))
@pgc01,

Hey, that is a good idea! Thanks! I literally finished writing that code seconds before I posted it and then immediately went to sleep (6am local time is when I go to sleep for the "night"), so there was not a lot of time for refinement. Normally, if I had time to think about it, I would have written two direct Transpose assignment statements without the intermediary helper array that I used, but putting .Keys and .Items together inside an Array function call (before applying the Transpose) just would not have occurred to me... so thanks for the idea (which I plan to use in the future when applicable).


@Chris78,

Here is my code streamlined using the idea pgc01 posted (it is a tad more efficient)...
Code:
[table="width: 500"]
[tr]
	[td]Sub WordCounts()
  Dim R As Long, X As Long, Data As Variant, PuncChars As Variant, Words() As String
  PuncChars = Array(".", ",", ";", ":", "'", "!", "#", "$", "%", "&", "(", ")", " - ", "_", "--", "+", "=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
  Data = Range("A1", Cells(Rows.Count, "A").End(xlUp))
  With CreateObject("Scripting.Dictionary")
    For R = 1 To UBound(Data, 1)
      For X = LBound(PuncChars) To UBound(PuncChars)
        Data(R, 1) = Replace(UCase(Data(R, 1)), PuncChars(X), " ")
      Next
      Words = Split(Application.Trim(Data(R, 1)))
      For X = 0 To UBound(Words)
        If Len(Words(X)) > 3 Then .Item(Words(X)) = .Item(Words(X)) + 1
      Next
    Next
    Range("C1").Resize(.Count, 2) = Application.Transpose(Array(.Keys, .Items))
  End With
End Sub[/td]
[/tr]
[/table]
 
Last edited:
Upvote 0
Thanks pgc01 & Rick - Much appreciated to have the working V2 now :-)
 
Upvote 0
@Chris78

Here's a little more compact and efficient version with the advantage of using the Regular Expression object. This version will also be more accurate by omitting other non-alphanumeric characters (such as the | or ~ characters) but still allow for a single '-' character that I noticed you wanted to include in your array of punctuation characters. I also made the pattern exclude dashes at the beginning or the end of words so "-cat" or "dog-" wouldn't be counted, but "Humpty-" would be counted as "Humpty". Finally, this version will also handle words between any sort of white space character (tabs, line breaks, other invisible characters you might copy/paste from a webpage) instead of just space characters.

Code:
Sub WordCounts()
  Dim Dict As Object, Data As Variant, Regex As Object, Match As Object
  Set Regex = CreateObject("VBScript.Regexp")
  Regex.Pattern = "[A-Z0-9](?:[A-Z0-9]|-(?=[A-Z0-9])){2,}[A-Z0-9]"
  Regex.Global = True
  Set Dict = CreateObject("Scripting.Dictionary")
  Data = UCase(Join(Application.Transpose(Range("A1", Cells(Rows.Count, "A").End(xlUp)))))
  For Each Match In Regex.Execute(Data)
    Dict.Item(Match.Value) = Dict.Item(Match.Value) + 1
  Next Match
  [C1].Resize(Dict.Count, 2) = Application.Transpose(Array(Dict.Keys, Dict.Items))
End Sub
 
Upvote 0
@LockeGarmin
Brilliant - I will dig into the details and obviously will take advantage of regex as detailed above.
@pgc01 + @Rick Rothstein
I know worked with your enhanced code above.. and besides including the regexps anytime soon
I also tried (but failed) to to extend the exclusions of the words to list. Indeed the above excludes all words of 3 chars or less but that seems poor on real usage. I'm trying to have a range of words to exclude; which I'm happy to maintain manually - and get those excluded from the word-list and associated occurrences... any guidance you can give me?
Sub WordCounts() Dim R As Long, X As Long, Data As Variant, PuncChars As Variant, Words() As String PuncChars = Array(".", ",", ";", ":", "'", "!", "#", "$", "%", "&", "(", ")", " - ", "_", "--", "+", "=", "~", "/", "", "{", "}", "[", "]", """", "?", "*") Data = Range("A1", Cells(Rows.Count, "A").End(xlUp)) With CreateObject("Scripting.Dictionary") For R = 1 To UBound(Data, 1) For X = LBound(PuncChars) To UBound(PuncChars) Data(R, 1) = Replace(UCase(Data(R, 1)), PuncChars(X), " ") Next Words = Split(Application.Trim(Data(R, 1))) For X = 0 To UBound(Words) If Len(Words(X)) > 3 Then .Item(Words(X)) = .Item(Words(X)) + 1 Next Next Range("C1").Resize(.Count, 2) = Application.Transpose(Array(.Keys, .Items)) End WithEnd Sub
 
Last edited:
Upvote 0
Looks like I have found my way out.... adding a second condition with "application.match" against a pre-defined array with my noise words.
Cheers
extract:
If Len(Words(x)) > 3 And IsError(Application.Match(Words(x), Noise, 0)) Then .Item(Words(x)) = .Item(Words(x)) + 1
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top