Find Keyword Density in Rows and Columns

MarkoX50

New Member
Joined
Jan 22, 2013
Messages
45
Hi, I have a workbook that has 1 column (Column A) with 8000 Rows. Each Cell contains a string of words ranging from 1 word to 100 words. I am looking for a formula that can create a list of most common words (keywords) in each cell across the column. Ideally, i would like to know the most repeated words in the entire column.

Can this be done with a formula or can it only be done via VBA, if so, can someone please assist me with the code or formula.

Much Appreciated.
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
Try this for strings of words in column "A":-
Code:
[COLOR=Navy]Sub[/COLOR] MG06Aug08
[COLOR=Navy]Dim[/COLOR] Rng [COLOR=Navy]As[/COLOR] Range, Dn [COLOR=Navy]As[/COLOR] Range, n [COLOR=Navy]As[/COLOR] [COLOR=Navy]Long,[/COLOR] sp [COLOR=Navy]As[/COLOR] Variant, omax [COLOR=Navy]As[/COLOR] [COLOR=Navy]Long,[/COLOR] R [COLOR=Navy]As[/COLOR] Variant
[COLOR=Navy]Dim[/COLOR] K [COLOR=Navy]As[/COLOR] Variant, str [COLOR=Navy]As[/COLOR] [COLOR=Navy]String,[/COLOR] Lg
[COLOR=Navy]Set[/COLOR] Rng = Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
[COLOR=Navy]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR=Navy]For[/COLOR] [COLOR=Navy]Each[/COLOR] Dn [COLOR=Navy]In[/COLOR] Rng
    sp = Split(Dn.Value, " ")
        [COLOR=Navy]For[/COLOR] [COLOR=Navy]Each[/COLOR] R [COLOR=Navy]In[/COLOR] sp
            [COLOR=Navy]If[/COLOR] Not R = vbNullString [COLOR=Navy]Then[/COLOR]
                [COLOR=Navy]If[/COLOR] Not .Exists(R) [COLOR=Navy]Then[/COLOR]
                    .Add R, 1
                [COLOR=Navy]Else[/COLOR]
                    .Item(R) = .Item(R) + 1
                [COLOR=Navy]End[/COLOR] If
            [COLOR=Navy]End[/COLOR] If
        [COLOR=Navy]Next[/COLOR] R
[COLOR=Navy]Next[/COLOR] Dn


Lg = Application.Large(.items, 1) '[COLOR=Green][B] Change the "1" to 2,3-- for second and third largest[/B][/COLOR]
[COLOR=Navy]For[/COLOR] [COLOR=Navy]Each[/COLOR] K [COLOR=Navy]In[/COLOR] .keys
    [COLOR=Navy]If[/COLOR] .Item(K) = Lg [COLOR=Navy]Then[/COLOR]
        str = str & K & Chr(10)
    [COLOR=Navy]End[/COLOR] If
[COLOR=Navy]Next[/COLOR] K
[COLOR=Navy]End[/COLOR] With
MsgBox "Count = " & Lg & Chr(10) & str
[COLOR=Navy]End[/COLOR] [COLOR=Navy]Sub[/COLOR]
Regards Mick
 
Last edited:
Upvote 0
Can you give an example of your data in column "A".
The results will show in a message box, you can have them in a cell if required.
 
Upvote 0
Column A has data as below. If i could get the results in Column B, that would be great. I have 8000 rows of data

A1: RT @blacksilvergh Wen u dey on mtn network nd de cell info display 50% discount on sms, but wey ppl still send sms in dis era herh #ThatOneTooIscuu
A2: If MTN network is as bad as it was 2hrs ago eh, I'll curse them in my ancestors' name
A3: Work Opportunity at MTN Nigeria: Trade Marketing Officers - Work Opportunity at MTN Nigeria: Trade Marketing Officers - JobMe #LatestJobs Via @JobMeNG
Etc

Thanks
 
Upvote 0
Thats awesome Mick, 2 things though. Is it possible to populate the results with the top 50 words but excluding common expressions such as 'is', 'the', 'at', 'a', etc.

You a serious legend, thanks for the assist.
 
Upvote 0
Try this:-
Code:
[COLOR="Navy"]Sub[/COLOR] MG11Aug55
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] sp [COLOR="Navy"]As[/COLOR] Variant, omax [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] R [COLOR="Navy"]As[/COLOR] Variant
[COLOR="Navy"]Dim[/COLOR] K [COLOR="Navy"]As[/COLOR] Variant, Str [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] Lg, NoStr [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String[/COLOR]
NoStr = "#cat#dog#the#" '[COLOR="Green"][B] Alter/Add words not to include here !!!![/B][/COLOR]
[COLOR="Navy"]Set[/COLOR] Rng = Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    sp = Split(Dn.Value, " ")
        [COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] R [COLOR="Navy"]In[/COLOR] sp
            [COLOR="Navy"]If[/COLOR] Not R = vbNullString And InStr(NoStr, "#" & R & "#") = 0 [COLOR="Navy"]Then[/COLOR]
                [COLOR="Navy"]If[/COLOR] Not .Exists(R) [COLOR="Navy"]Then[/COLOR]
                    .Add R, 1
                [COLOR="Navy"]Else[/COLOR]
                    .Item(R) = .Item(R) + 1
                [COLOR="Navy"]End[/COLOR] If
            [COLOR="Navy"]End[/COLOR] If
        [COLOR="Navy"]Next[/COLOR] R
[COLOR="Navy"]Next[/COLOR] Dn


Lg = Application.Large(.items, 1) '[COLOR="Green"][B] Change the "1" to 2,3-- for second and third largest[/B][/COLOR]
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] K [COLOR="Navy"]In[/COLOR] .keys
    [COLOR="Navy"]If[/COLOR] .Item(K) = Lg [COLOR="Navy"]Then[/COLOR]
        Str = Str & K & ", "
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] K
[COLOR="Navy"]End[/COLOR] With
Range("B1") = Left(Str, Len(Str) - 2)
Range("C1") = "Count= " & Lg
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Hi Mick,

Thanks for the above. Works great. Can i ask 1 last favour. Please can you display the results in Column B in top 50 Words instead of the Top Word. So instead of " Change the "1" to 2,3-- for second and third largest", I would like a list of top 50 words. Thanks.
 
Upvote 0
Try this:-
Code:
[COLOR="Navy"]Sub[/COLOR] MG20Aug43
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] sp [COLOR="Navy"]As[/COLOR] Variant, omax [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] R [COLOR="Navy"]As[/COLOR] Variant
[COLOR="Navy"]Dim[/COLOR] K [COLOR="Navy"]As[/COLOR] Variant, Str [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] Lg, NoStr [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] sCount [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String[/COLOR]


NoStr = "#cat#dog#the#" '[COLOR="Green"][B] Alter/Add words not to include here !!!![/B][/COLOR]
[COLOR="Navy"]Set[/COLOR] Rng = Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    sp = Split(Dn.Value, " ")
        [COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] R [COLOR="Navy"]In[/COLOR] sp
            [COLOR="Navy"]If[/COLOR] Not R = vbNullString And InStr(NoStr, "#" & R & "#") = 0 [COLOR="Navy"]Then[/COLOR]
                [COLOR="Navy"]If[/COLOR] Not .Exists(R) [COLOR="Navy"]Then[/COLOR]
                    .Add R, 1
                [COLOR="Navy"]Else[/COLOR]
                    .Item(R) = .Item(R) + 1
                [COLOR="Navy"]End[/COLOR] If
            [COLOR="Navy"]End[/COLOR] If
        [COLOR="Navy"]Next[/COLOR] R
[COLOR="Navy"]Next[/COLOR] Dn


[COLOR="Navy"]For[/COLOR] n = 1 To 50
Lg = Application.Large(.items, n)


[COLOR="Navy"]If[/COLOR] Lg > 0 And InStr(sCount, "#" & Lg & "#") = 0 [COLOR="Navy"]Then[/COLOR]
    sCount = sCount & "#" & Lg & "#"
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] K [COLOR="Navy"]In[/COLOR] .keys
    [COLOR="Navy"]If[/COLOR] .Item(K) = Lg [COLOR="Navy"]Then[/COLOR]
        Str = Str & K & ", "
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] K
    c = c + 1
    Range("B" & c).Value = Left(Str, Len(Str) - 2)
    Range("C" & c).Value = "Count= " & Lg
    Str = ""
[COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR] n
[COLOR="Navy"]End[/COLOR] With


[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top