Find top 10 most common words in a column of text strings

duncanfactary

New Member
Joined
Oct 15, 2009
Messages
14
Hi all,

I've been racking my brains trying to find a way of doing this. I have a list (column A in Excel) of over 50,000 organisations and I'd like to know what the most common words used in the names are. Ideally it would great if I could produce a top 10 list of the most common words at the top e.g. Ltd, School or Church with a count in the next column of how times that word it appears

Thanks in advance!
 

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
Approach I'd take is separate all the words out into individual cells, then perform an advanced filter to remove duplicates and then use COUNTIF to count the number of times each word occurs to generate a top 10
 
Upvote 0
i might have a macro at home that does something similar will have a look later, cant promise for definite
 
Upvote 0
ok found it, i got this from the tinternet years ago so I claim no kudos for it, on sheet1 have all your data in column A, it will create a new sheet with the results and a pivot table

Code:
Sub MakeWordList()
    Dim InputSheet As Worksheet
    Dim WordListSheet As Worksheet
    Dim PuncChars As Variant, x As Variant
    Dim i As Long, r As Long
    Dim txt As String
    Dim wordCnt As Long
    Dim AllWords As Range
    Dim PC As PivotCache
    Dim PT As PivotTable
    
    Application.ScreenUpdating = False
    Set InputSheet = ActiveSheet
    Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
    WordListSheet.Range("A1") = "All Words"
    WordListSheet.Range("A1").Font.Bold = True
    InputSheet.Activate
    wordCnt = 2
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
        "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
        "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    r = 1

'   Loop until blank cell is encountered
    Do While Cells(r, 1) <> ""
'       covert to UPPERCASE
        txt = UCase(Cells(r, 1))
'       Remove punctuation
        For i = 0 To UBound(PuncChars)
            txt = Replace(txt, PuncChars(i), "")
        Next i
'       Remove excess spaces
        txt = WorksheetFunction.Trim(txt)
'       Extract the words
        x = Split(txt)
        For i = 0 To UBound(x)
            WordListSheet.Cells(wordCnt, 1) = x(i)
            wordCnt = wordCnt + 1
        Next i
    r = r + 1
    Loop
    
'   Create pivot table
    WordListSheet.Activate
    Set AllWords = Range("A1").CurrentRegion
    Set PC = ActiveWorkbook.PivotCaches.Add _
        (SourceType:=xlDatabase, _
        SourceData:=AllWords)
    Set PT = PC.CreatePivotTable _
        (TableDestination:=Range("C1"), _
        TableName:="PivotTable1")
    With PT
        .AddDataField .PivotFields("All Words")
        .PivotFields("All Words").Orientation = xlRowField
    End With
    Application.ScreenUpdating = True
End Sub
 
Upvote 0
I feel like I accomplished something, but now looking at that code, I see that there are much more elegant ways to get this done. For example, I didn't realize there is a split function, so I manually coded one...

Here's my code:
Code:
Sub countwords()
Dim z As Integer, q As Integer, x As Boolean
z = Cells(Rows.Count, 1).End(xlUp).Row
q = 1
Dim a()
ReDim a(1 To 1)
'i represents the different rows
For i = 1 To z

'this sanitizes the text
txt = sanitize(Cells(i, 1).Value)

'the following loop p■■■■s out each word from the cell contents
    Do While Len(txt) > 0
    
    If Len(txt) > Len(Replace(txt, " ", "")) Then
        tmp = Left(txt, WorksheetFunction.Find(" ", txt) - 1)
        Else: tmp = txt
        End If
    For j = 1 To q - 1
        If a(j) = tmp Then x = True ' x is used as a flag for duplicates
    Next j
    
    If Not (x) Then
        ReDim Preserve a(1 To q)
        a(q) = tmp
        
        q = q + 1
        End If
        
    'prep for next loop
    txt = Mid(txt, Len(tmp) + 1)
    txt = Trim(txt)
    x = False
    Loop

Next i

Dim b()
ReDim b(1 To UBound(a), 1 To 2)

For k = 1 To UBound(a)
b(k, 1) = a(k)
b(k, 2) = 0
Next k
    'i represents the different rows
    For i = 1 To z
    
    'this sanitizes the text
    txt = sanitize(Cells(i, 1).Value)
    
    'the following loop p■■■■s out each word from the cell contents
    Do While Len(txt) > 0
        
        If Len(txt) > Len(Replace(txt, " ", "")) Then
            tmp = Left(txt, WorksheetFunction.Find(" ", txt) - 1)
            Else: tmp = txt
            End If
        For j = 1 To UBound(a)
            If tmp = a(j) Then b(j, 2) = b(j, 2) + 1
        Next j
            
        txt = Mid(txt, Len(tmp) + 1)
        txt = Trim(txt)
    Loop
    Next i
    
Sheets.Add after:=Sheets(Sheets.Count)
For i = 1 To UBound(a)
With Sheets(Sheets.Count)
    .Cells(i, 1).Value = b(i, 1)
    .Cells(i, 2).Value = b(i, 2)
End With
Next i

Sheets(Sheets.Count).Range(Cells(1, 1), Cells(UBound(a), 2)).Sort Key1:=Cells(1, 2), Order1:=xlDescending

End Sub
Private Function sanitize(txt As String)
    txt = Replace(txt, ".", "")
    txt = Replace(txt, ",", "")
    txt = Replace(txt, "(", "")
    txt = Replace(txt, ")", "")
    sanitize = txt
End Function
 
Upvote 0
ok found it, i got this from the tinternet years ago so I claim no kudos for it, on sheet1 have all your data in column A, it will create a new sheet with the results and a pivot table

Code:
Sub MakeWordList()
    Dim InputSheet As Worksheet
    Dim WordListSheet As Worksheet
    Dim PuncChars As Variant, x As Variant
    Dim i As Long, r As Long
    Dim txt As String
    Dim wordCnt As Long
    Dim AllWords As Range
    Dim PC As PivotCache
    Dim PT As PivotTable
    
    Application.ScreenUpdating = False
    Set InputSheet = ActiveSheet
    Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
    WordListSheet.Range("A1") = "All Words"
    WordListSheet.Range("A1").Font.Bold = True
    InputSheet.Activate
    wordCnt = 2
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
        "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
        "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    r = 1

'   Loop until blank cell is encountered
    Do While Cells(r, 1) <> ""
'       covert to UPPERCASE
        txt = UCase(Cells(r, 1))
'       Remove punctuation
        For i = 0 To UBound(PuncChars)
            txt = Replace(txt, PuncChars(i), "")
        Next i
'       Remove excess spaces
        txt = WorksheetFunction.Trim(txt)
'       Extract the words
        x = Split(txt)
        For i = 0 To UBound(x)
            WordListSheet.Cells(wordCnt, 1) = x(i)
            wordCnt = wordCnt + 1
        Next i
    r = r + 1
    Loop
    
'   Create pivot table
    WordListSheet.Activate
    Set AllWords = Range("A1").CurrentRegion
    Set PC = ActiveWorkbook.PivotCaches.Add _
        (SourceType:=xlDatabase, _
        SourceData:=AllWords)
    Set PT = PC.CreatePivotTable _
        (TableDestination:=Range("C1"), _
        TableName:="PivotTable1")
    With PT
        .AddDataField .PivotFields("All Words")
        .PivotFields("All Words").Orientation = xlRowField
    End With
    Application.ScreenUpdating = True
End Sub


When I use this VBA in excel 2013 - I'm getting a Run type 13 error - mismatch error and am being pointed to these lines of the code with the arrow in the debugger specifically on the 3rd line:

Set PC = ActiveWorkbook.PivotCaches.Add _
(SourceType:=xlDatabase, _
SourceData:=AllWords)


Any thoughts?
 
Upvote 0

Forum statistics

Threads
1,223,244
Messages
6,170,976
Members
452,372
Latest member
Natalie18

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top