cpmurray1985
New Member
- Joined
- Mar 10, 2022
- Messages
- 15
- Office Version
- 365
- Platform
- Windows
Hello,
I've looked and attempted to modify the code, but as I am completely new to VBA, I feel I am doing it wrong. I'm not getting errors, but it is not producing the desired results.
I am looking to have a word/phrase counter that count the word frequency, and in addition, displays two and three word phrases from a column. The code provided already does so, however, I need to modify it a little as it will not count words that are hyphenated as one word. I still would like for the macro to ignore special characters, such as ., -, !, etc as it is currently doing, if it is by itself or after a word.
Expected Result:
What I get instead:
Also, is it possible to keep the code in a certain range without clearing the rest? I know the macro contains "Range("C:Z").ClearContents", which is what I want to, but just to clear from C to L. However, when I adjust it to, for example "Range("C:L").ClearContents", and if there is content in column M and N, it will directly start in column P, even if column C:L is empty.
The code is from this post, but just in case it can't be viewed. Thank you.
I've looked and attempted to modify the code, but as I am completely new to VBA, I feel I am doing it wrong. I'm not getting errors, but it is not producing the desired results.
I am looking to have a word/phrase counter that count the word frequency, and in addition, displays two and three word phrases from a column. The code provided already does so, however, I need to modify it a little as it will not count words that are hyphenated as one word. I still would like for the macro to ignore special characters, such as ., -, !, etc as it is currently doing, if it is by itself or after a word.
Expected Result:
Word/Sentence | 1 Word | Frequency | 2 Word | Frequency | 3 Word | Frequency | ||||
---|---|---|---|---|---|---|---|---|---|---|
Take pre-caution today | Pre-caution | 2 | Take Pre-caution | 1 | Take pre-caution today | 1 | ||||
Pre-caution - Warning signs | take | 1 | Pre-caution today | 1 | ||||||
today | 1 | warning signs | 1 | |||||||
warning | 1 | |||||||||
signs | 1 | |||||||||
What I get instead:
Word/Sentence | 1 Word | Frequency | 2 Word | Frequency | 3 Word | Frequency | ||||
---|---|---|---|---|---|---|---|---|---|---|
Take pre-caution today | caution | 2 | caution today | 1 | ||||||
Pre-caution - Warning signs | pre | 2 | take pre | 1 | ||||||
signs | 1 | warning signs | 1 | |||||||
Take | 1 | |||||||||
signs | 1 | |||||||||
today | 1 | |||||||||
warning | 1 |
Also, is it possible to keep the code in a certain range without clearing the rest? I know the macro contains "Range("C:Z").ClearContents", which is what I want to, but just to clear from C to L. However, when I adjust it to, for example "Range("C:L").ClearContents", and if there is content in column M and N, it will directly start in column P, even if column C:L is empty.
The code is from this post, but just in case it can't be viewed. Thank you.
VBA Code:
Sub regexPhraseFrequency1()
'The code will generate word/phrase frequency
'Data must be in column A, start at A1
'Data can't be more than 65536 rows, because it's using Application.Transpose.
'sNumber = "1,2,3" means it generates 3 frequency list: single word & 2 word phrase & 3 word phrase, you can change that to suit.
'Word with apostrophe such as "you're" is counted as one word.
'Word with underscore such as "aa_bb" is counted as one.
'Tested on text (from a novel) with 16.500 rows, contains 161K words (12600 unique words) with sNumber = "1,2,3", it took 10.5 seconds
Dim i As Long
Dim sNumber As String, txa As String
Dim z, T
Dim obj As New DataObject
T = Timer
Application.ScreenUpdating = False
Range("C:Z").ClearContents
txa = Join(Application.Transpose(Range("A1", Cells(Rows.Count, "A").End(xlUp))), vbLf)
'--- CHANGE sNumber VALUE TO SUIT -----------------------------------
sNumber = "1,2,3" 'list each number of words in a phrase, change to suit
'sNumber = "1" will generate 1 frequency list of single word
'sNumber = "1,2" will generate 2 frequency list: single word & 2 word phrase
z = Split(sNumber, ",")
For i = LBound(z) To UBound(z)
Call toProcess(CLng(z(i)), txa)
Next
Range("C:Z").Columns.AutoFit
Application.ScreenUpdating = True
Debug.Print Timer - T
End Sub
Sub toProcess(n As Long, ByVal tx As String)
'phrase frequency
Dim regEx As Object, matches As Object, x As Object, d As Object
Dim i As Long, rc As Long
Dim va, q
tx = Replace(tx, "'", "___") 'replace apostrophe with "___", so it will match pattern "\w+"
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
End With
If n > 1 Then
regEx.Pattern = "( ){2,}"
If regEx.test(tx) Then
tx = regEx.Replace(tx, " ") 'remove excessive space
End If
tx = Trim(tx)
tx = Replace(tx, " ", "_____") 'replace space with "_____", , so it won't match pattern "\W+"
regEx.Pattern = "\W+" 'non words character, Matches any character that is not
'a word character (alphanumeric & underscore). Equivalent to [^A-Za-z0-9_]
If regEx.test(tx) Then
tx = regEx.Replace(tx, vbLf) 'replace non words character with new line char
End If
tx = Replace(tx, "_____", " ") ' change it back to space
tx = Replace(tx, vbLf & " ", vbLf & "") 'remove space in the beginning of every line
End If
Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare
regEx.Pattern = Trim(WorksheetFunction.Rept("\w+ ", n)) 'match n words (the phrase) separated by a space
Set matches = regEx.Execute(tx)
For Each x In matches
d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
Next
For i = 1 To n - 1
regEx.Pattern = "^\w+ "
If regEx.test(tx) Then
tx = regEx.Replace(tx, "") 'remove first word in each line to get different combination of n words (phrase)
regEx.Pattern = Trim(WorksheetFunction.Rept("\w+ ", n))
Set matches = regEx.Execute(tx)
For Each x In matches
d(CStr(x)) = d(CStr(x)) + 1 'get phrase frequency
Next
End If
Next
If d.Count = 0 Then MsgBox "Nothing with " & n & " word phrase found": Exit Sub
rc = Cells(1, Columns.Count).End(xlToLeft).Column
'put the result in col D:E
With Cells(2, rc + 2).Resize(d.Count, 2)
If d.Count < 65536 Then 'Transpose function has a limit of 65536 item to process
.Value = Application.Transpose(Array(d.Keys, d.items))
Else
ReDim va(1 To d.Count, 1 To 2)
i = 0
For Each q In d.Keys
i = i + 1
va(i, 1) = q: va(i, 2) = d(q)
Next
.Value = va
End If
'get the apostrophe back
.Replace What:="___", Replacement:="'", lookAt:=xlPart, SearchFormat:=False, ReplaceFormat:=False
.Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
End With
Cells(1, rc + 2) = n & " WORD"
Cells(1, rc + 3) = "FREQUENCY"
End Sub