List most common words from every fifth row in a column

English Nerd

New Member
Joined
May 26, 2015
Messages
2
Hi everybody!

I am quite new to attempting anything advanced on excel. So I'll try to provide as much information as possible.

I am using Excel 2013 on Windows 8.

I have a list of single words in each cell. Some of them are the exact same word, however most of them only occur once. The first word starts in cell G17 and each new word appears on every 5th row in the same column (G17, G22, G27, G32, etc.). It goes on for thousands of rows.

What I need to do is create a seperate list that compiles all of the words from the most common to the least common, as well as noting the number of times each word appeared. All of the words are important, so the list needs to include even words which only appear once.

Every 5th cell, starting from G16 has "D." written on it. The formula(s) that I am looking for could either target every 5th cell, or target the entire column, and I can just ignore the "D."s. Whatever is easiest.

Let me know if I forgot any important information!

Thanks a lot!
 

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
Hi and welcome to the MrExcel Message Board.

This will probably be a bit hard to understand but you mentioned thousands of rows so I thought you might like something quick.

If you copy the data from the worksheet into an array in VBA you can do that with one instruction.
Then there is a Dictionary object which is basically a list of keys and items. I set the key to the work and the item to a count of the word.
This cannot be sorted without help so I used a SortedList. You just copy the data in and it is sorted.
Then you can't output a sorted list easily (as far as I know) so I copied it back to an Array.
The Array is then written back to a second worksheet.

There are details about all the objects on the web if you would like to find out some more.

Code:
Option Explicit
Sub WordFreq()

    Dim ws1 As Worksheet
    Dim ws2 As Worksheet
    Dim arrIn As Variant
    Dim arrOut As Variant
    Dim dicWords As Object
    Dim srtList As Object
    Dim lr As Long
    Dim i As Long
    Dim key As Variant
    
    '==========================================================
    ' Initializing settings
    '==========================================================
    ' Worksheet with the data
    ' Name can be changed by overtyping
    Set ws1 = ThisWorkbook.Worksheets("Sheet1")
    ' Worksheet for the output NB will be cleared
    ' Name can be changed by overtyping
    Set ws2 = ThisWorkbook.Worksheets("Sheet2")
    ' A dictionary has a key and an item.
    ' New words are added
    ' Old words have their count incremented
    Set dicWords = CreateObject("Scripting.Dictionary")
    ' A SortedList sorts data when added
    Set srtList = CreateObject("System.Collections.Sortedlist")
    
    '==========================================================
    ' Read in the words and add to the Dictionary
    '==========================================================
    With ws1
        ' Find the last data row
        lr = .Cells(.Rows.Count, "G").End(xlUp).Row
        ' Read the column into an Array for speed
        arrIn = .Range("G17:G" & lr)
        ' Transfer words to the Dictionary and increment
        For i = 1 To UBound(arrIn) Step 5
            If dicWords.Exists(arrIn(i, 1)) Then
                dicWords.Item(key:=arrIn(i, 1)) = dicWords.Item(key:=arrIn(i, 1)) + 1
            Else
                dicWords.Item(key:=arrIn(i, 1)) = 1
            End If
        Next
    End With

    '==========================================================
    ' Sort the list of words into word count order
    '==========================================================
    ' Set the SortedLlist size
    srtList.capacity = dicWords.Count
    ' Add each word and its count from the Dictionary
    For Each key In dicWords
        srtList.Add Format(dicWords(key), "000000") & key, key
    Next
   
    '==========================================================
    ' Create the output Array in descending word count order
    '==========================================================
    ' Set the output Array size
    ReDim arrOut(1 To srtList.Count, 1 To 2)
    ' Add each word and its count to the output Array
    For i = 0 To srtList.Count - 1
        arrOut(srtList.Count - i, 1) = srtList.GetByIndex(i)
        arrOut(srtList.Count - i, 2) = Left(srtList.getkey(i), 6)
    Next
    
    '==========================================================
    ' Output the data to the second worksheet
    '==========================================================
    With ws2
        ' Empty the output worksheet
        .Cells.Clear
        ' Write the column headings
        .Range("A1").Resize(, 2) = Array("Word", "Count")
        ' Write the output Array to the worksheet
        .Range("A2").Resize(srtList.Count, 2) = arrOut
    End With
    
End Sub

I added some comments to try and give you a fighting chance.
 
Upvote 0
It's complicated ...

Yes, sorry about that. I could have used worksheets to hold the data as it was processed but when you repeatedly move lots of data, piecemeal, from VBA to worksheet it can take forever to run.

The big question is: Does it work? If it does then you can sort it out at your leisure.

All the objects, arrays, dictionaries and sortedlists can be thought of as a column or more of data. Annoyingly, you need to use different types of object because there isn't one that does everything. Arrays are good at getting data in and out of the worksheet and for simple processing. Dictionaries are good if you have two columns of data that comprise key and item. SortedLists are like a cross between arrays and dictionaries that have the main function of sorting.

So it is really just a question of working out the big steps. (I have given those the big comment headings.) Then using the correct object for that process. The exact syntax of, say, using a dictionary does not have to be memorised or even understood. You just need a supply of such code snippets and be able to make them work when required. Knowing what functionality exists is half the battle.

All the details are on the web. That is how I discovered them!

If you set up a short list of words, you could use the F8 key in the VB Editor to single step through the macro. You can right click on any object and add a "Watch". This will display the values of the objects as they change as you step through. You can watch the array fill up, for instance.

Regards,
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top