Hi Folks,
I'm pretty new to VBA but confident this is the right place to find some help...
I'm trying extract the list of all distinct words from a worksheet, followed by the number of occurrences the same word is found in the complete source sheet.
Only words having more than 3 characters should be considered and punctuation ignored.
Si basically, this is an example of the source and expected result
Source:
| A |
1|Humpty Dumpty sat on a wall
2|Humpty had a great fall
3|Humpty Dumpty again
Expected Result:
| A | B |
1|HUMPTY | 3 |
2|DUMPTY | 2 |
3|WALL | 1 |
4|GREAT | 1 |
5|FALL | 1 |
6|AGAIN | 1 |
I got it that far my macro generates the list of words but am struggling to:
a - list *distinct* words
b - count the number of occurrences
Actual VBA code
Sub MakeWordList()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long
Dim txt As String
Dim wordCnt As Long
Dim MinLen As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable
Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1") = "All Words"
WordListSheet.Range("A1").Font.Bold = True
InputSheet.Activate
wordCnt = 2
MinLen = 3
PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
"$", "%", "&", "(", ")", " - ", "_", "--", "+", _
"=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
r = 1
' Loop until blank cell is encountered
Do While Cells(r, 1) <> ""
' covert to UPPERCASE
txt = UCase(Cells(r, 1))
' Remove punctuation
For i = 0 To UBound(PuncChars)
txt = Replace(txt, PuncChars(i), "")
Next i
' Remove excess spaces
txt = WorksheetFunction.Trim(txt)
' Extract the words
x = Split(txt)
For i = 0 To UBound(x)
If Len(x(i)) > MinLen Then
WordListSheet.Cells(wordCnt, 1) = x(i)
'WordListSheet.Cells(wordCnt, 2) = ?here would come the occurrence-count?
wordCnt = wordCnt + 1
End If
Next i
r = r + 1
Loop
End Sub
I'm pretty new to VBA but confident this is the right place to find some help...
I'm trying extract the list of all distinct words from a worksheet, followed by the number of occurrences the same word is found in the complete source sheet.
Only words having more than 3 characters should be considered and punctuation ignored.
Si basically, this is an example of the source and expected result
Source:
| A |
1|Humpty Dumpty sat on a wall
2|Humpty had a great fall
3|Humpty Dumpty again
Expected Result:
| A | B |
1|HUMPTY | 3 |
2|DUMPTY | 2 |
3|WALL | 1 |
4|GREAT | 1 |
5|FALL | 1 |
6|AGAIN | 1 |
I got it that far my macro generates the list of words but am struggling to:
a - list *distinct* words
b - count the number of occurrences
Actual VBA code
Sub MakeWordList()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long
Dim txt As String
Dim wordCnt As Long
Dim MinLen As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable
Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1") = "All Words"
WordListSheet.Range("A1").Font.Bold = True
InputSheet.Activate
wordCnt = 2
MinLen = 3
PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
"$", "%", "&", "(", ")", " - ", "_", "--", "+", _
"=", "~", "/", "", "{", "}", "[", "]", """", "?", "*")
r = 1
' Loop until blank cell is encountered
Do While Cells(r, 1) <> ""
' covert to UPPERCASE
txt = UCase(Cells(r, 1))
' Remove punctuation
For i = 0 To UBound(PuncChars)
txt = Replace(txt, PuncChars(i), "")
Next i
' Remove excess spaces
txt = WorksheetFunction.Trim(txt)
' Extract the words
x = Split(txt)
For i = 0 To UBound(x)
If Len(x(i)) > MinLen Then
WordListSheet.Cells(wordCnt, 1) = x(i)
'WordListSheet.Cells(wordCnt, 2) = ?here would come the occurrence-count?
wordCnt = wordCnt + 1
End If
Next i
r = r + 1
Loop
End Sub