Finding most common words in column WITHOUT specifying what words

echrist

New Member
Joined
Mar 5, 2013
Messages
4
Hey everyone. We're doing some restructuring of our programs at my organization, and I'm trying to make a list of what is considered most important to our clients based on their feedback. We enter their questionnaire responses into Excel regularly, but there are thousands of responses. I'm trying to make a list of what words appear most frequently without going through one by one and making a list of keywords.
Without keywords, I haven't been able to find anything on the subject. As an additional hurdle, the responses are written in sentence form (there will be ~10-15 words in each response).
For example:

I want to eat more pizza.
Make your program longer.
I wish we could eat more meals.

Obviously these are made up, but the hope is that "I," "eat," etc. are returned at the top of a list of most used words.
Is this possible?
Thanks in advance!
 
Actually, it isn't clear to me, do you want to count the frequency in each row (so each row has its own result)?
If yes, then try this:

VBA Code:
Sub a726216_WordFrequency_1()
'word frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim obj As New DataObject
Dim tx As String, z As String
Dim t, q, va, vb, vc, ary, p
Dim i As Long, n As Long, j As Long, k As Long


Application.ScreenUpdating = False
n = Range("A" & Rows.Count).End(xlUp).Row
ReDim vb(1 To n, 1 To 3)
Range("F2:F" & n).Value = "~~"
Range("A2:F" & Cells(Rows.Count, "A").End(xlUp).Row).Copy
obj.GetFromClipboard
tx = obj.GetText
Application.CutCopyMode = False

tx = Replace(tx, "'", "___")
    
    
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .pattern = "\w+"
        End With

    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare
        
ary = Split(tx, "~~")

For Each p In ary
            Set matches = regEx.Execute(p)
            
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1
            Next
                
        If d.Count = 0 Then j = j + 1: GoTo skip:
        
        'put the result in col M:N
        Range("M:N").ClearContents
        With Range("M1").Resize(d.Count, 2)
            If d.Count < 65536 Then 'Transpose function has a limit of 65536 item to process
                
                .Value = Application.Transpose(Array(d.Keys, d.items))
                
            Else
                
                ReDim va(1 To d.Count, 1 To 2)
                i = 0
                    For Each q In d.Keys
                        i = i + 1
                        va(i, 1) = q: va(i, 2) = d(q)
                    Next
                .Value = va
                
            End If
            .Replace What:="___", Replacement:="'", LookAt:=xlPart, SearchFormat:=False, ReplaceFormat:=False
            .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
            
        End With

        vc = Range("M1:M3")
        j = j + 1
        For k = 1 To 3
            vb(j, k) = vc(k, 1)
        Next
        d.RemoveAll
        Range("M:N").ClearContents
skip:

Next


Range("G2").Resize(UBound(vb, 1), 3) = vb
Range("F:F").ClearContents

Application.ScreenUpdating = True
End Sub

If the result isn't as expected then please give me an example with 2 or 3 rows & also the result as expected.
 
Upvote 0

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.
1574197445014.png
1574197445014.png
1574197445014.png
 
Upvote 0
A2:E2 ---> G2:I2
A3:E3 ---> G3:I3
A4:E4 ---> G4:I4

And likewise.
I will be inserting data in A to E columns from 1-10,000 lets say. so A2:E2 to A1000:E1000. The frequency of word count of top 3 words should be for each row seperately as mentioned above not for all rows combined. I want result from G2:I2 to G1000:I1000, for A1000:E1000, for each row to be processed for their specific keywords respectively
 
Upvote 0
First:
Can you post your data as a table not as an image? so I can copy the table to my sheet.
You can post your data as a table using XL2BB add-in, you can find it here:

Second:
Using the code in post #21 on your example above, what is the result?
I think the code should do what you want.
 
Upvote 0
I didnt check the post 21, because i dont want to count the frequency. I just want the top three highest frequency "Keywords" in the three columns besides it
 
Upvote 0
Actually the code in post #21 is the modified code to deal with your problem.
And using the code, this is the result :

Book1
ABCDEFGHI
1TEXT STRINGS
2Reusable Gallon Freezer Bags - 1 Gallon Ziplock Bags 5 PACK, LEAKPROOF Gallon Stora...Tutti. 100% Silicone Reusable Storage Food Bags, 2 Large(50oz) 6 Cups & 2 Medium(30oz) 4 Cups + 6 Silicone Stretch Lids, FDA Approved-BPA Free | Eco-Friendly Snack Storage Bags[10 Pack Silicone Set.]Silicone Bags Reusable Silicone Food Bag Reusable Sandwich Bags Reusable Ziplock Bags Silicone Storage Bags Silicon Containers Plastic Conteiner Freezer Gallon Size Zip Snack Lunch Sous VideEco-Aforcli Reusable Silicone Gallon Food Storage Bags | Silicone Gallon Freezer Bags | Perfect for Storing Fruit, Meat, Sous Vide, Sandwich, Snack | 2 Extra Large Bags | Safe in Freezer, DishwasherReusable Storage Bags 6-Pack, JONYJ Leakproof Reusable Lunch Sandwich Bags, FDA Grade PEVA Kids Snack Bags, Extra Thick Ziplock Bags for Food Snacks, Make-up, Stationery, Travel Home OrganizationBagsReusableSilicone
3Knife Sharpening Stone Kit, Finew Professional Whetstone Sharpener Stone Set, Premium 4 Side Grit 400/1000 3000/8000 Water Stone, Whetstone Cutlery 20-10960 Knife Sharpening Stone-Dual Sided 400/1000 Grit Water Stone-Sharpener and Polishing BearMoo Whetstone Premium 2-IN-1 Sharpening Stone 3000/8000 Grit Waterstone Kit - Knife Sharpener Stone Safe Knife Sharpening Stone Set Premium Japanese Whetstone 400/1000 3000/8000 Double Side Grit Water Stone Best Knife Sharpening Stone,Meterk 2-Sided Whetstone Set 1000/6000 Grits Knife Stone Sharpener Kit with Non-slip Bamboo Base, Angle Guide, Flattening Stone for Home & KitchenStoneKnifeSharpening
4Yalis Push Pins 600 Count, Standard Clear Thumb Tacks Steel Point and Clear Plastic HeadFred STUCK ON EWE Sheep Push Pin HolderU Brands Push Pins, Clear Plastic Head, Steel Point, 200-CountBetyBedy Push Pins, 400PCS Multi-Color Map Thumb Tacks, Plastic Marking Pins with Sharp Point for Bulletin Board, Fabric Grtard Push Pins Rose Gold Thumb Tacks 200-Count Standard Pins Gold Steel Point and Transparent Plastic HeadPinsPushPlastic
Sheet2


The result is a bit different for the last row, can you check again your result manually?
My count is:

Book1
DE
2Pins6
3Push5
4Plastic4
Sheet4
 
Upvote 0
its working perfectly. Thankyou so much. can i also get a count code pls if you have already worked on. The count number can be output on the cell besides the word
 
Upvote 0
can i also get a count code pls if you have already worked on. The count number can be output on the cell besides the word
Replace this part:
vc = Range("M1:M3") j = j + 1 For k = 1 To 3 vb(j, k) = vc(k, 1) Next

with this:

vc = Range("M1:N3") j = j + 1 For k = 1 To 3 vb(j, k) = vc(k, 1) & " : " & vc(k, 2) Next

Book1
GHI
2Bags : 15Reusable : 8Silicone : 8
3Stone : 12Knife : 6Sharpening : 5
4Pins : 6Push : 5Plastic : 4
Sheet5
 
Upvote 0
First of all, because of the the new forum style, my code in post #12 become messy. So this is the code in post #12 in case anyone needs it:

VBA Code:
Sub regexWordFrequency()
'word frequency
'Put the data in col A, run the code, the result is in col D:E.
'tested on a text (from a novel) with 161K words, it took 1.2 seconds

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim obj As New DataObject
Dim tx As String, z As String
Dim t, q, va
Dim i As Long

t = Timer
Range("A1", Cells(Rows.Count, "A").End(xlUp)).Copy
obj.GetFromClipboard
tx = obj.GetText
Application.CutCopyMode = False
tx = Replace(tx, "'", "___")
   
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .pattern = "\w+"
        End With

    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare
       
            Set matches = regEx.Execute(tx)
           
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1
            Next
               
If d.Count = 0 Then MsgBox "Nothing found": Exit Sub

'put the result in col D:E
Range("D:E").ClearContents
With Range("D2").Resize(d.Count, 2)
    If d.Count < 65536 Then 'Transpose function has a limit of 65536 item to process
       
        .Value = Application.Transpose(Array(d.Keys, d.items))
       
    Else
       
        ReDim va(1 To d.Count, 1 To 2)
        i = 0
            For Each q In d.Keys
                i = i + 1
                va(i, 1) = q: va(i, 2) = d(q)
            Next
        .Value = va
       
    End If
    .Replace What:="___", Replacement:="'", LookAt:=xlPart, SearchFormat:=False, ReplaceFormat:=False
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
   
End With

Range("D1") = "WORD"
Range("E1") = "FREQUENCY"
Range("D:E").Columns.AutoFit

Debug.Print Timer - t
End Sub


Ok, try this:

VBA Code:
Sub a726216_WordFrequency()
'word frequency

Dim regEx As Object, matches As Object, x As Object, d As Object
Dim obj As New DataObject
Dim tx As String, z As String
Dim t, q, va
Dim i As Long

Range("A2:E" & Cells(Rows.Count, "A").End(xlUp).Row).Copy
obj.GetFromClipboard
tx = obj.GetText
Application.CutCopyMode = False
tx = Replace(tx, "'", "___")
   
        Set regEx = CreateObject("VBScript.RegExp")
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .pattern = "\w+"
        End With

    Set d = CreateObject("scripting.dictionary")
    d.CompareMode = vbTextCompare
       
            Set matches = regEx.Execute(tx)
           
            For Each x In matches
                d(CStr(x)) = d(CStr(x)) + 1
            Next
               
If d.Count = 0 Then MsgBox "Nothing found": Exit Sub

'put the result in col M:N
Range("M:N").ClearContents
With Range("M1").Resize(d.Count, 2)
    If d.Count < 65536 Then 'Transpose function has a limit of 65536 item to process
       
        .Value = Application.Transpose(Array(d.Keys, d.items))
       
    Else
       
        ReDim va(1 To d.Count, 1 To 2)
        i = 0
            For Each q In d.Keys
                i = i + 1
                va(i, 1) = q: va(i, 2) = d(q)
            Next
        .Value = va
       
    End If
    .Replace What:="___", Replacement:="'", LookAt:=xlPart, SearchFormat:=False, ReplaceFormat:=False
    .Sort Key1:=.Cells(1, 2), Order1:=xlDescending, Key2:=.Cells(1, 1), Order2:=xlAscending, Header:=xlNo
   
End With

Range("G2:I2") = Application.Transpose(Range("M1:M3"))
Range("M:N").ClearContents

End Sub

Note:
1. I assumed data is in col A:E, starting at row 2.
2. The result is in "G2:I2"
3. Column M:N are temporary helper column.


Hi Akuini - I absolutely love this feature. Is there a way to edit the code so that rather than searching for single words, it looks for common strings of 2 or 3 (or more?) words? This way you could quickly analyze to see if there are common sentences, or part sentences, cropping up, rather than just single words. Thanks!

Example: searching for strings of 2 words in the sentence "I like to fish because to fish is nice", it would show:

I like = 1
like to = 1
to fish = 2
fish because = 1
because to = 1
fish is = 1
is nice = 1
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,337
Members
452,637
Latest member
Ezio2866

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top