Count Word and Random Phrase Frequency

Sleeplol

Board Regular
Joined
Apr 10, 2019
Messages
194
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
Hello Everyone,
I'm attempting to count the word frequency as well as two and three word phrases from a column. I'm stuck and would really appreciate some help.

Getting error:
Run-time error 1004.png


On this line:
ErrorLastLine.png


Setup:
A macro populates column A with values from AP101:AP50000 (formulas combining additional row cells into a single cell).
(Problem) Macro populates the count of B,E,H

Runs fine though throttles for about 2 min for 30000ish rows; then bugs out after populating the results.
I've tried referencing the sheet with Worksheets("DashTest").Range on all three, but it still errors.

Below is a quick shot of what it looks like, as well as the entire script.
Thanks for any help

WordCountDisplay.png



VBA Code:
Sub test()
Dim a, e, s, Ignore As String, temp, x
Dim dic As Object, dic2 As Object, dic3 As Object
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = 1
Set dic2 = CreateObject("Scripting.Dictionary")
dic2.CompareMode = 1
Set dic3 = CreateObject("Scripting.Dictionary")
dic3.CompareMode = 1
With Range("m1").CurrentRegion.Offset(1)
Ignore = Join(Application.Transpose(.Resize(.Rows.Count - 1).Value), Chr(2))
End With
a = Range("a1", Range("a" & Rows.Count).End(xlUp)).Value
With CreateObject("VBScript.RegExp")
.Global = True: .IgnoreCase = True
.Pattern = "([\$\(\)\-\^\|\\\[\]\*\+\?\.])"
Ignore = "[^\w ]|\b(" & Replace(.Replace(Ignore, "\$1"), Chr(2), "|") & ")\b"
For Each e In a
If e <> "" Then
x = GetSentense(e, 2)
If IsArray(x) Then
For Each s In x
dic2(s) = dic2(s) + 1
Next
End If
x = GetSentense(e, 3)
If IsArray(x) Then
For Each s In x
If s <> "" Then dic3(s) = dic3(s) + 1
Next
End If
.Pattern = Ignore
temp = Application.Trim(.Replace(e, ""))
For Each s In Split(temp)
If s <> "" Then dic(s) = dic(s) + 1
Next
End If
Next
End With
Range("b2").Resize(dic.Count, 2).Value = _
Application.Transpose(Array(dic.keys, dic.items))
Range("e2").Resize(dic2.Count, 2).Value = _
Application.Transpose(Array(dic2.keys, dic2.items))
Range("h2").Resize(dic3.Count, 2).Value = _
Application.Transpose(Array(dic3.keys, dic3.items))
End Sub
Function GetSentense(ByVal txt As String, myStep)
Dim i As Long, ii As Long, temp, x
On Error Resume Next
x = Split(txt): ReDim temp(UBound(x) - myStep)
If Err Then GetSentense = Empty: Exit Function
On Error GoTo 0
For i = 0 To UBound(x) - myStep - 1
For ii = 1 To myStep
temp(i) = Trim$(temp(i) & " " & x(i + ii - 1))
Next
Next
GetSentense = temp
End Function
 
So I’m possibly looking to search column D (list of common words) against column A and identify if any rows are made entirely of words in column D. I will then want to extract this as a separate column/list.
Ok, try this:
The result are marked with "1" in col B.
VBA Code:
Sub a1122204a()
'https://www.mrexcel.com/board/threads/count-word-and-random-phrase-frequency.1122204/page-3
Dim i As Long
Dim va, vb, vc, x, z
Dim d As Object
Dim flag As Boolean

Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare
va = Range("D2", Cells(Rows.Count, "D").End(xlUp))
    
    For i = 1 To UBound(va, 1)
        d(Trim(va(i, 1))) = Empty
    Next

vb = Range("A1", Cells(Rows.Count, "A").End(xlUp))
ReDim vc(1 To UBound(vb, 1), 1 To 1)

For i = 2 To UBound(vb)

    z = WorksheetFunction.Trim(removeNonWord(CStr(vb(i, 1))))
        flag = False
        
        For Each x In Split(z)
            If Not d.Exists(x) Then flag = True: Exit For
        Next
        
        If flag = False Then vc(i, 1) = 1
Next

Range("B1").Resize(UBound(vc, 1), 1) = vc

End Sub

Function removeNonWord(tx As String) As String
    With CreateObject("vbscript.regexp")
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
        .Pattern = "[^A-Z0-9]"
        removeNonWord = .Replace(tx, " ")
    End With
End Function

Book1
ABCD
1NameCommon Words
2The Abbott, CommunityThe
3Community Health SystemsRed
4The Blue Sky1Blue
5Kimberly-Clark CoSky
6The Red-Sky Co1Community
7The Red, Community1Co
8Systems: Community1Systems
Sheet1
 
Upvote 0

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
Ok, try this:
The result are marked with "1" in col B.
VBA Code:
Sub a1122204a()
'https://www.mrexcel.com/board/threads/count-word-and-random-phrase-frequency.1122204/page-3
Dim i As Long
Dim va, vb, vc, x, z
Dim d As Object
Dim flag As Boolean

Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare
va = Range("D2", Cells(Rows.Count, "D").End(xlUp))
   
    For i = 1 To UBound(va, 1)
        d(Trim(va(i, 1))) = Empty
    Next

vb = Range("A1", Cells(Rows.Count, "A").End(xlUp))
ReDim vc(1 To UBound(vb, 1), 1 To 1)

For i = 2 To UBound(vb)

    z = WorksheetFunction.Trim(removeNonWord(CStr(vb(i, 1))))
        flag = False
       
        For Each x In Split(z)
            If Not d.Exists(x) Then flag = True: Exit For
        Next
       
        If flag = False Then vc(i, 1) = 1
Next

Range("B1").Resize(UBound(vc, 1), 1) = vc

End Sub

Function removeNonWord(tx As String) As String
    With CreateObject("vbscript.regexp")
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
        .Pattern = "[^A-Z0-9]"
        removeNonWord = .Replace(tx, " ")
    End With
End Function

Book1
ABCD
1NameCommon Words
2The Abbott, CommunityThe
3Community Health SystemsRed
4The Blue Sky1Blue
5Kimberly-Clark CoSky
6The Red-Sky Co1Community
7The Red, Community1Co
8Systems: Community1Systems
Sheet1
Hi Akuini, sorry for my delay in replying but just wanted to say thank you so much for the above. Worked as expected and is being used to deploy some new algorithms tomorrow, where we want to mitigate false positive results whilst omitting common words. Such a great help, much appreciated.
 
Upvote 0
You're welcome, glad to help, & thanks for the feedback.:)
 
Upvote 0
@Akuini Hi! Newbie here. Is there a way to have the counts populate in an entirely different sheet?

I have a sheet that currently uses A-H, with text strings living in column D. E and F are for manually entered Category (E) and Theme (E). This sheet is used to populate a bunch of different pivot tables throughout the workbook, so I don't want to mess with it - if possible. I'm looking to take the results from your script and create a list of 'these words are usually Categorized as XYZ' for use in some sort of magic code that will auto-assign the categorization. But drilling down to the specifics of step 1, it would make my life easier if I could run this every so often and the results would populate in a separate sheet (in the same workbook). Does that make sense? I know I can just copy/paste, but I thought I'd throw it out there...
 
Upvote 0
Nvm - I see someone figured out how to do this. I'll try harder :) I do have one other question though! If in the '1' count "I" is counted 1400 times, and in the "2" column "but I" is counted 51 times...are those separate instances? Or does the one word count of "I" encompass any 2 or 3 word counts that have I in it?
 
Upvote 0
If in the '1' count "I" is counted 1400 times, and in the "2" column "but I" is counted 51 times...are those separate instances?
No, they aren't separate instances.
 
Upvote 0
Ok, thanks. Is there anyway to tag that? For example if "cost" is counted as 1-word, and then in two words there is "affordable cost" and "expensive cost" I'd like to determine where there is (or is not) overlap, given how the sentiment can greatly change with the addition of a second word.
 
Upvote 0
@Akuini -

#1 - you're awesome.

#2 - for the first code segment you provided in post #13 (corrected code for reply #6), I pasted it into a new module in a workbook called WordFrequencyTemplate.xlsm. Pasted the values in, added the notes you have in the code onto a notes tab, then named the other tab Data. I pasted about 5k rows of text starting at cell A1 (just the column of text, nothing else). When I ran the code I copied from you, I got a compile error: User-defined type not defined, and when I went into debug, it was this line:

Dim obj As New DataObject

Did I miss something else in this thread? I'm assuming this is PEBKAC, but not sure. Since you're somehow answering like everyone on this board at once, I'm not sure I'm following all of the right ones to get this working (otherwise this is going to be a long slog through customer comments over 13 months...). Thanks!
 
Upvote 0
Hi, @auggie24
I just checked the code & realized that we don't need that line, it's from earlier code I wrote. So just delete the line:
Dim obj As New DataObject

Note:
That line works if you add the reference “Microsoft Forms 2.0 Object Library” (to work with clipboard). But we don't need that in this code.
 
Upvote 0
Hi, @auggie24
I just checked the code & realized that we don't need that line, it's from earlier code I wrote. So just delete the line:
Dim obj As New DataObject

Note:
That line works if you add the reference “Microsoft Forms 2.0 Object Library” (to work with clipboard). But we don't need that in this code.
Hey @Akuini , ran out of bandwidth to follow up, and just found some time to swing back at this. Again, I feel like this is a PEBKAC error, but it keeps throwing errors at me. First one is Run-Time Error '429': Active-X component can't create object. In debug mode, brings me to line 49, set regEx = CreateObject ("VBScript.Regexp"). Not sure what I am doing wrong. Is there a place to put my .xlsm to review to see what it is, since this board doesn't allow uploads?
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top