VBA code to count unique words in a range of cells

scottyblaze

New Member
Joined
Aug 19, 2014
Messages
5
Hello, I am trying to find a way to count the number of unique words in a range of cells when each cell contains numerous words. For example:

[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD]The dog and the cat are asleep.[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]The cat is inside.[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]The dog is outside.[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]






I'm looking for a way to count the unique words in the above range, which should be 9 for "The, dog, and, cat, are, asleep, is, inside, outside". I've found formulas to count unique cells, but not unique words. Ideally, I want to put this into VBA so that all I would need to do is highlight a range and run the macro.

Is this possible? Can anyone offer some assistance?

Thanks.
 
Modified some code that I had around. Select your range then run the Sub

Code:
Sub Test_ReturnUniqueWordsAndCountsInSelectedRanges()

    Dim x As Variant
    Dim sWorksheet As String
    
    x = ReturnUniqueWordsAndCountsInSelectedRanges(Selection, "A", "V")
    
    MsgBox UBound(x, 2) & " unique words in selection."
    
    'If you want to display a list of the words and counts on the 'Output' worksheet
    '  (which will be deleted and recreated each time the code is run then comment out
    '  (with a single quote) the 'GoTo End_Sub' line that follows
    
    GoTo End_Sub
    
    sWorksheet = "Output"
    On Error Resume Next
    Application.DisplayAlerts = False
    Worksheets(sWorksheet).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
    Worksheets.Add(After:=Sheets(Sheets.count)).Name = sWorksheet 'After last
    Range("A1").Resize(1, 2).Value = Array("Word", "Count")
    Worksheets(sWorksheet).Range("A2").Resize(UBound(x, 2), 2).Value = Application.Transpose(x)
    
End_Sub:
    
End Sub

Function ReturnUniqueWordsAndCountsInSelectedRanges(rngInput As Range, Optional sSortOrder_ADX As String, Optional sSortField_VC As String)
    'Not case sensitive
    'SortOrder  "D" = Descending                          sSortField  "C" = Sort by Count
    '           "X" = Unsorted                            Anything Else = Sort by Values (Not case sensitive)
    '           Anything else = Ascending
    
    Dim lX As Long, lY As Long
    Dim rngSelected() As Range  'Array that contains each selected cell
    Dim lSelectedCount As Long
    Dim varA As Variant
    Dim varOutput As Variant
    Dim varK As Variant, varI As Variant
    Dim varTemp1 As Variant, varTemp2 As Variant
    Dim bSortOrderCheck As Boolean
    Dim lSortOrder As Long
    Dim lSortField As Long
    Dim aryWords As Variant
    Dim sCellContents As String
    Dim sCellRebuild As String
    Dim sOneChar As String
    
    Select Case UCase(sSortOrder_ADX)
    Case "D": lSortOrder = 2    'Descending
    Case "X": lSortOrder = 3    'Unsorted
    Case Else: lSortOrder = 1   'Ascending
    End Select
    
    Select Case UCase(sSortField_VC)
    Case "C": lSortField = 2     'Sort by Count
    Case Else: lSortField = 1    'Sort by Value
    End Select
    
    'Iterate all areas; each individual cell into 1D array
    For lX = 1 To rngInput.Areas.count
        For lY = 1 To rngInput.Areas(lX).Cells.count
            lSelectedCount = lSelectedCount + 1
            ReDim Preserve rngSelected(1 To lSelectedCount)
            Set rngSelected(lSelectedCount) = rngInput.Areas(lX).Cells(lY)
        Next
    Next
    
    With CreateObject("Scripting.Dictionary")
    
        .CompareMode = vbTextCompare
        
        'Inventory selected cells
        For Each varA In rngSelected
            sCellContents = varA.Value
            'Replace non-alpha characters with space
            For lX = 1 To Len(sCellContents)
                sOneChar = Mid(sCellContents, lX, 1)
                Select Case Asc(sOneChar)
                Case 65 To 90, 97 To 122
                    sCellRebuild = sCellRebuild & sOneChar
                Case Else
                    sCellRebuild = sCellRebuild & " "
                End Select
            Next
            
            'Replace multiple spaces with single space
            With CreateObject("VBScript.RegExp")
                .Global = True
                .Pattern = "\s{2,}" 'Instances of 2 or more consecutive spaces
                sCellRebuild = Trim(.Replace(sCellRebuild, " "))
            End With
            
            'Add individual words to Scripting Dictionary
            aryWords = Split(sCellRebuild, " ")
            For lX = LBound(aryWords) To UBound(aryWords)
                .Item(aryWords(lX)) = .Item(aryWords(lX)) + 1
            Next
        Next
        
        'Copy Values (Keys) and Counts (Items) from Scripting Dictionary to 1D arrays
        varK = .Keys
        varI = .Items
        
        'Copy both to 2D array
        ReDim varOutput(1 To 2, 1 To .count)
        For lX = 1 To .count
            varOutput(1, lX) = varK(lX - 1)
            varOutput(2, lX) = varI(lX - 1)
        Next
        
    End With
    
    If lSortOrder < 3 Then 'A sort option was selected (1=
        'Sort 2D array
        For lY = LBound(varOutput, 2) To UBound(varOutput, 2) - 1
            For lX = lY + 1 To UBound(varOutput, 2)
                bSortOrderCheck = UCase(varOutput(lSortField, lY)) > UCase(varOutput(lSortField, lX))   'Ascending Order Sort
                If lSortOrder = 2 Then bSortOrderCheck = Not bSortOrderCheck
                If bSortOrderCheck Then
                    varTemp1 = varOutput(1, lX)
                    varTemp2 = varOutput(2, lX)
                    varOutput(1, lX) = varOutput(1, lY)
                    varOutput(2, lX) = varOutput(2, lY)
                    varOutput(1, lY) = varTemp1
                    varOutput(2, lY) = varTemp2
                End If
            Next
        Next
    End If
    
    ReturnUniqueWordsAndCountsInSelectedRanges = varOutput

    'Set rngInput = Nothing 'if this is uncommented then the range in the calling routine is set to Nothing as well

End Function
 
Upvote 0
Thanks for the code, but it seems to be having an issue between cells causing it to count the last word in a cell plus the first word in the next cell like so:
[TABLE="width: 64"]
<colgroup><col style="width:48pt" width="64"> </colgroup><tbody>[TR]
[TD="width: 64"]Word[/TD]
[/TR]
[TR]
[TD]and[/TD]
[/TR]
[TR]
[TD]are[/TD]
[/TR]
[TR]
[TD]asleep[/TD]
[/TR]
[TR]
[TD]asleepThe[/TD]
[/TR]
[TR]
[TD]cat[/TD]
[/TR]
[TR]
[TD]dog[/TD]
[/TR]
[TR]
[TD]inside[/TD]
[/TR]
[TR]
[TD]insideThe[/TD]
[/TR]
[TR]
[TD]is[/TD]
[/TR]
[TR]
[TD]outside[/TD]
[/TR]
[TR]
[TD]The[/TD]
[/TR]
</tbody>[/TABLE]

However, someone else sent me an answer privately that was able to handle this problem suitably. Thanks again.
 
Upvote 0
Sorry about the error. I forgot one important line which I have added below.

At any rate, please publish the other code the you received so other may benefit.

Rich (BB code):
Option Explicit

Sub Test_ReturnUniqueWordsAndCountsInSelectedRanges()

    Dim x As Variant
    Dim sWorksheet As String
    
    x = ReturnUniqueWordsAndCountsInSelectedRanges(Selection, "A", "V")
    
    MsgBox UBound(x, 2) & " unique words in selection."
    
    'If you want to display a list of the words and counts on the 'Output' worksheet
    '  (which will be deleted and recreated each time the code is run then comment out
    '  (with a single quote) the 'GoTo End_Sub' line that follows
    
    'GoTo End_Sub
    
    sWorksheet = "Output"
    On Error Resume Next
    Application.DisplayAlerts = False
    Worksheets(sWorksheet).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
    Worksheets.Add(After:=Sheets(Sheets.Count)).Name = sWorksheet 'After last
    Range("A1").Resize(1, 2).Value = Array("Word", "Count")
    Worksheets(sWorksheet).Range("A2").Resize(UBound(x, 2), 2).Value = Application.Transpose(x)
    
End_Sub:
    
End Sub

Function ReturnUniqueWordsAndCountsInSelectedRanges(rngInput As Range, Optional sSortOrder_ADX As String, Optional sSortField_VC As String)
    'Not case sensitive
    'SortOrder  "D" = Descending                          sSortField  "C" = Sort by Count
    '           "X" = Unsorted                            Anything Else = Sort by Values (Not case sensitive)
    '           Anything else = Ascending
    
    Dim lX As Long, lY As Long
    Dim rngSelected() As Range  'Array that contains each selected cell
    Dim lSelectedCount As Long
    Dim varA As Variant
    Dim varOutput As Variant
    Dim varK As Variant, varI As Variant
    Dim varTemp1 As Variant, varTemp2 As Variant
    Dim bSortOrderCheck As Boolean
    Dim lSortOrder As Long
    Dim lSortField As Long
    Dim aryWords As Variant
    Dim sCellContents As String
    Dim sCellRebuild As String
    Dim sOneChar As String
    
    Select Case UCase(sSortOrder_ADX)
    Case "D": lSortOrder = 2    'Descending
    Case "X": lSortOrder = 3    'Unsorted
    Case Else: lSortOrder = 1   'Ascending
    End Select
    
    Select Case UCase(sSortField_VC)
    Case "C": lSortField = 2     'Sort by Count
    Case Else: lSortField = 1    'Sort by Value
    End Select
    
    'Iterate all areas; each individual cell into 1D array
    For lX = 1 To rngInput.Areas.Count
        For lY = 1 To rngInput.Areas(lX).Cells.Count
            lSelectedCount = lSelectedCount + 1
            ReDim Preserve rngSelected(1 To lSelectedCount)
            Set rngSelected(lSelectedCount) = rngInput.Areas(lX).Cells(lY)
        Next
    Next
    
    With CreateObject("Scripting.Dictionary")
    
        .CompareMode = vbTextCompare
        
        'Inventory selected cells
        For Each varA In rngSelected
            sCellRebuild = vbNullString
            sCellContents = varA.Value
            'Replace non-alpha characters with space
            For lX = 1 To Len(sCellContents)
                sOneChar = Mid(sCellContents, lX, 1)
                Select Case Asc(sOneChar)
                Case 65 To 90, 97 To 122
                    sCellRebuild = sCellRebuild & sOneChar
                Case Else
                    sCellRebuild = sCellRebuild & " "
                End Select
            Next
            
            'Replace multiple spaces with single space
            With CreateObject("VBScript.RegExp")
                .Global = True
                .Pattern = "\s{2,}" 'Instances of 2 or more consecutive spaces
                sCellRebuild = Trim(.Replace(sCellRebuild, " "))
            End With
            
            'Add individual words to Scripting Dictionary
            aryWords = Split(sCellRebuild, " ")
            For lX = LBound(aryWords) To UBound(aryWords)
                .Item(aryWords(lX)) = .Item(aryWords(lX)) + 1
            Next
            
        Next
        
        'Copy Values (Keys) and Counts (Items) from Scripting Dictionary to 1D arrays
        varK = .Keys
        varI = .Items
        
        'Copy both to 2D array
        ReDim varOutput(1 To 2, 1 To .Count)
        For lX = 1 To .Count
            varOutput(1, lX) = varK(lX - 1)
            varOutput(2, lX) = varI(lX - 1)
        Next
        
    End With
    
    If lSortOrder < 3 Then 'A sort option was selected (1=
        'Sort 2D array
        For lY = LBound(varOutput, 2) To UBound(varOutput, 2) - 1
            For lX = lY + 1 To UBound(varOutput, 2)
                bSortOrderCheck = UCase(varOutput(lSortField, lY)) > UCase(varOutput(lSortField, lX))   'Ascending Order Sort
                If lSortOrder = 2 Then bSortOrderCheck = Not bSortOrderCheck
                If bSortOrderCheck Then
                    varTemp1 = varOutput(1, lX)
                    varTemp2 = varOutput(2, lX)
                    varOutput(1, lX) = varOutput(1, lY)
                    varOutput(2, lX) = varOutput(2, lY)
                    varOutput(1, lY) = varTemp1
                    varOutput(2, lY) = varTemp2
                End If
            Next
        Next
    End If
    
    ReturnUniqueWordsAndCountsInSelectedRanges = varOutput

    'Set rngInput = Nothing 'if this is uncommented then the range in the calling routine is set to Nothing as well

End Function
 
Upvote 0
scottyblaze,

Welcome to the MrExcel forum.

Sample raw data with range B1:B3 selected:


Excel 2007
BCD
1The dog and the cat are asleep.
2The cat is inside.
3The dog is outside.
4
5
6
7
8
9
10
Sheet1


After the macro:


Excel 2007
BCD
1The dog and the cat are asleep.The
2The cat is inside.dog
3The dog is outside.and
4cat
5are
6asleep
7is
8inside
9outside
10
Sheet1


Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

1. Copy the below code
2. Open your NEW workbook
3. Press the keys ALT + F11 to open the Visual Basic Editor
4. Press the keys ALT + I to activate the Insert menu
5. Press M to insert a Standard Module
6. Where the cursor is flashing, paste the code
7. Press the keys ALT + Q to exit the Editor, and return to Excel
8. To run the macro from Excel press ALT + F8 to display the Run Macro Dialog. Double Click the macro's name to Run it.

Code:
Option Compare Text
Sub ExtractSelectionUniqueWords()
' hiker95, 08/20/2014, ME800029
Dim rng As Range, c As Range, h As String
Dim i As Long, s, k, nc As Long
Application.ScreenUpdating = False
Set rng = Selection
nc = Selection.Column + 2
With CreateObject("Scripting.Dictionary")
  .CompareMode = vbTextCompare
  For Each c In rng
    If c <> "" Then
      If Right(c, 1) = "." Then
        h = Left(c, Len(c) - 1)
      Else
        h = c
      End If
      s = Split(h, " ")
      For i = LBound(s) To UBound(s)
        If Not .Exists(s(i)) Then
          .Add s(i), s(i)
        End If
      Next i
    End If
  Next
  k = Application.Transpose(Array(.Keys))
End With
Columns(nc).ClearContents
Cells(1, nc).Resize(UBound(k)) = k
Columns(nc).AutoFit
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the ExtractSelectionUniqueWords macro.
 
Upvote 0
Thanks for the outpouring of ideas. I've already received a couple solid macros.

This one uses a selected range:
Code:
Sub SelectedRangeCount()
'Ctrl+Shift+C to run on selected cells
    Dim r As Range, e
    With CreateObject("Scripting.Dictionary")
        .CompareMode = 1
        For Each r In Selection
            For Each e In Split(Replace(r.Value, ".", ""))
                .Item(e) = .Item(e) + 1
            Next
        Next
        MsgBox "Unique Words In Selected Range: " & Format(.Count)
    End With
End Sub

This one uses a predefined column across all worksheets in a workbook:
Code:
Sub WorkbookCount()
'Ctrl+Shift+Q to run on defined columns
    Dim x, i As Long, e, s
    ReDim x(1 To Worksheets.Count)
    For i = 1 To Worksheets.Count
        With Worksheets(i)
            x(i) = .Range("b1", .Range("b" & Rows.Count).End(xlUp)).Value
        End With
    Next
    With CreateObject("Scripting.Dictionary")
        .CompareMode = 1
        For i = 1 To UBound(x)
            If IsArray(x(i)) Then
                For Each e In x(i)
                    If e <> "" Then
                        For Each s In Split(Replace(e, ".", ""))
                            .Item(s) = Empty
                        Next
                    End If
                Next
            Else
                If Not IsEmpty(x(i)) Then
                    For Each e In Split(Replace(x(i), ".", ""))
                        .Item(e) = Empty
                    Next
                End If
            End If
        Next
        MsgBox "Unique Words In Defined Column: " & Format(.Count)
    End With
End Sub

These both seem to do what I'm after (after a touch of tweaking). Unless I missed something obvious, I'd call this mystery solved.
 
Upvote 0
scottyblaze,

Thanks for the feedback.

1. Did you even try my macro in reply #5, that required you to make a selection?

2. Did you even try pbornemeier's macro in reply #4?

3. Can I ask where you obtained the other two macros?
 
Last edited:
Upvote 0
Yes, the solution in #4 works well. The solution in #5 lists out the unique words (as #4 does) but it doesn't provide the count. #4 is definitely a working solution. Thanks again for the effort.

scottyblaze,

Thanks for the feedback.

1. Did you even try my macro in reply #5, that required you to make a selection?

2. Did you even try pbornemeier's macro in reply #4?

3. Can I ask where you obtained the other two macros?
 
Upvote 0
Old thread but i was wondering .. in relation to reply #4, the final sorted data in descending order does not take any number over 10+ as such but instead sorts it with 1

To clarify: 1,7,13,3,5 are being sorted as 1,13,3,5,7

Anyone know how to fix this within the VBA code provided?

Thank you
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top