Advanced Excel - Counting occurences of specific words within a range of cells (is it possible?)

hokeeffe

New Member
Joined
Nov 19, 2015
Messages
4
Hi all,

First time poster here! I have some functionality I'd love to create in Excel but I'm not actually sure it's possible. Actually I doubt it's possible but crazier things have happened! :) Would love if anyone could point me in the right direction! There are multiple steps so I'll outline below how I see it working, any input would be great! It's also important to note that the forumlas/code need to cater for future entries too, not just what currently exists.

Goal: I have a column with about 200 entries. Each entry consists of text and ranges from 0 - 600 characters. Most cells have 5-9 words in them that begin with #. I'd love to be able to create a list in a new sheet that lists all of these words and how often they've been used.

Step 1: I assume the first thing I need to do is run some sort of formula or code to leave only the words beginning with #. Can I loop through every word in a cell and replace the words that don't start with # with ""? (not sure if this is important to mention but as I don't want to touch the initial data all data in this cell will be a reference from elsewhere so the entry in A1 will actually be something like "=Original!A1")
E.g (starting) A1:This is a #cell in #Excel that contains text and other #stuff but also a number of #hashtag #topics
E.g (result) A1:#cell #Excel #stuff #hashtag #topics

Step 2:Now that I have a column just containing hashtags, I assume I'll need to separate each word to a separate cell for counting purposes. I think maybe I can use text to columns for this?
E.g. (starting) A1:#cell #Excel #stuff #hashtag #topics
E.g. (result) A2:#cell A3:#Excel A4:#stuff A5:#hashtag A6:#topics

Step 3: Yippee! So I now have an entire range of cells containing hashtags from B2 to J200. So far so good! This is the part I'm unsure is possible. What I need to do now (likely on another sheet) is create a list of all the hashtags used, ideally in one column. This list needs to be free of duplicates so before adding a new entry, I need excel to check if it already exists in the list. (I need it to work like an array almost, not sure if capitalisation will be an issue but I can force all to lowercase if necessary).
(result) B1:#cell C1:#Excel D1:#stuff E1:#hashtag F1:#topics

Step 4: This is probably the easiest part. Count how many times a phrase appears in the range of cells created in step 2 using the column created in Step 3 as a reference.
Formula (B2): =COUNTIF(Worksheet2!$B:$J, B2) (May not be exactly right but you know what I mean!
(result): B1: #cell B2:1 C1:#Excel C2: 73 etc.

So Excel gurus, am I crazy? Or is this actually possible? :) Thanks!
 

Excel Facts

Who is Mr Spreadsheet?
Author John Walkenbach was Mr Spreadsheet until his retirement in June 2019.
Hi there,

You could give this macro a try, to use.

1. With your workbook open, press ALT+F11 to open the VBE
2. Click "Insert" > "Module" from the menu
3. Paste the code into the blank window on the right
4. Press ALT+Q to return to your workbook
5. Save your workbook as a macro enabled file (.xlsm)
6. With your sheet that contains the list active, press ALT+F8 and run the "ExtractAndCountHashTags" macro.

Note: I skipped steps 1, 2 and 3; the code will create a new worksheet with the distinct list and counts - test on a backup of your workbook.

Hope this helps :)

Code:
Sub ExtractAndCountHashTags()
Dim d As Object, v, w, i As Long
v = Range("A1:A" & Range("A" & Rows.Count).End(xlUp).Row).Value
Set d = CreateObject("Scripting.Dictionary")
d.CompareMode = vbTextCompare
For i = 1 To UBound(v)
    For Each w In Split(v(i, 1), " ")
        If Left(w, 1) = "#" Then
            If d.Exists(w) Then
                d.Item(w) = d.Item(w) + 1
                    Else
                d.Add w, 1
            End If
        End If
    Next w
Next i
ThisWorkbook.Worksheets.Add
Range("A1").Resize(d.Count).Value = Application.Transpose(d.Keys)
Range("B1").Resize(d.Count).Value = Application.Transpose(d.Items)
End Sub
 
Upvote 0
Hi all,

First time poster here! I have some functionality I'd love to create in Excel but I'm not actually sure it's possible. Actually I doubt it's possible but crazier things have happened! :) Would love if anyone could point me in the right direction! There are multiple steps so I'll outline below how I see it working, any input would be great! It's also important to note that the forumlas/code need to cater for future entries too, not just what currently exists.

Goal: I have a column with about 200 entries. Each entry consists of text and ranges from 0 - 600 characters. Most cells have 5-9 words in them that begin with #. I'd love to be able to create a list in a new sheet that lists all of these words and how often they've been used.

Step 1: I assume the first thing I need to do is run some sort of formula or code to leave only the words beginning with #. Can I loop through every word in a cell and replace the words that don't start with # with ""? (not sure if this is important to mention but as I don't want to touch the initial data all data in this cell will be a reference from elsewhere so the entry in A1 will actually be something like "=Original!A1")
E.g (starting) A1:This is a #cell in #Excel that contains text and other #stuff but also a number of #hashtag #topics
E.g (result) A1:#cell #Excel #stuff #hashtag #topics

Step 2:Now that I have a column just containing hashtags, I assume I'll need to separate each word to a separate cell for counting purposes. I think maybe I can use text to columns for this?
E.g. (starting) A1:#cell #Excel #stuff #hashtag #topics
E.g. (result) A2:#cell A3:#Excel A4:#stuff A5:#hashtag A6:#topics

Step 3: Yippee! So I now have an entire range of cells containing hashtags from B2 to J200. So far so good! This is the part I'm unsure is possible. What I need to do now (likely on another sheet) is create a list of all the hashtags used, ideally in one column. This list needs to be free of duplicates so before adding a new entry, I need excel to check if it already exists in the list. (I need it to work like an array almost, not sure if capitalisation will be an issue but I can force all to lowercase if necessary).
(result) B1:#cell C1:#Excel D1:#stuff E1:#hashtag F1:#topics

Step 4: This is probably the easiest part. Count how many times a phrase appears in the range of cells created in step 2 using the column created in Step 3 as a reference.
Formula (B2): =COUNTIF(Worksheet2!$B:$J, B2) (May not be exactly right but you know what I mean!
(result): B1: #cell B2:1 C1:#Excel C2: 73 etc.

So Excel gurus, am I crazy? Or is this actually possible? :) Thanks!
Hi hokeeffe, welcome to the boards.

It looks like I took so long to figure this out that FormR already beat me to it, but having put the effort in I thought I may as well share my version. It doesn't appear to be anywhere near as streamlined as the other suggestion, but I am just pleased it worked!

So basically, it looks at column A of sheet2 (where the cell values are =A1 from sheet1). It copies the actual values from Sheet2 column A into Sheet2 column B. It then extracts all of the unique (case insensitive) #words from Sheet2 column B into a list in Sheet2 column C. It then applies a COUNTIF to Sheet2 column D.

Code:
Sub CountHashTags()
Dim Cell As Range, sRange As Range, hRange As Range
Dim Dict As Object
Dim RegExp As Object
Dim Item As Variant
Dim Key As String
Dim LastRow As Long
Dim Rng As Range
Dim Text As String
Dim Wks As Worksheet
LastRow_Source = Sheets("Sheet2").Cells(Rows.Count, "A").End(xlUp).Row
Set sRange = Sheets("Sheet2").Range("A2:A" & LastRow_Source)
        For Each Cell In sRange
            If Cell.Value <> "" Then
                Cell.Offset(0, 1).Value = Cell.Value
            End If
        Next Cell
Set Wks = ActiveSheet
    Set Rng = Wks.Range("B2")
    
      LastRow = Wks.Cells(Rows.Count, Rng.Column).End(xlUp).Row
      Set Rng = IIf(LastRow < Rng.Row, Rng, Rng.Resize(LastRow + Rng.Row - 1, 1))
      
      Set Dict = CreateObject("Scripting.Dictionary")
      Dict.CompareMode = vbTextCompare
      
      Set RegExp = CreateObject("VBScript.RegExp")
      RegExp.IgnoreCase = False
      RegExp.Pattern = "[^#]*(#\w+)\s*(.*)"
      
      For Each Cell In Rng
        Text = Cell
        Do While RegExp.TEST(Text) = True
          Tag = RegExp.Replace(Text, "$1")
            If Tag <> "#" Then
              If Not Dict.Exists(Tag) Then Dict.Add Tag, ""
            End If
          Text = RegExp.Replace(Text, "$2")
        Loop
      Next Cell
      
   Wks.Range("C2").Resize(Dict.Count, 1).Value = WorksheetFunction.Transpose(Dict.Keys)
LastRow_Hashtags = Sheets("Sheet2").Cells(Rows.Count, "C").End(xlUp).Row
Sheets("Sheet2").Range("D2").Formula = "=COUNTIF(Sheet1!A:A,""*""&C2&""*"")"
Sheets("Sheet2").Range("D2").AutoFill Destination:=Range("D2:D" & LastRow_Hashtags)
End Sub
 
Upvote 0
Hi Fishboy,

You might want to try your code on a list like this, admittedly - it's not entirely clear what the OP wants for the last example (a count of 3 as mine does, or a count of 1 as yours).


Excel 2012
A
2words #hash
3words #hash
4words #hashtag
5words #hashtag
6#one #one #one
Sheet2
 
Upvote 0
Hi Fishboy,

You might want to try your code on a list like this, admittedly - it's not entirely clear what the OP wants for the last example (a count of 3 as mine does, or a count of 1 as yours).

Excel 2012
A
words #hash
words #hash
words #hashtag
words #hashtag
#one #one #one

<colgroup><col style="width: 25pxpx"><col></colgroup><thead>
</thead><tbody>
[TD="align: center"]2[/TD]

[TD="align: center"]3[/TD]

[TD="align: center"]4[/TD]

[TD="align: center"]5[/TD]

[TD="align: center"]6[/TD]

</tbody>
Sheet2
Hmmm, you're right...

Using your example list above my counts were, shall we say, not specific enough?

The #one appears 3 times but I count 1. Not good.

The word #hash (by itself) appears 2 times, but my count also picks up the #hash in the 2 #hashtag's as well. Also not good.

A fine example of me needing to thoroughly test before jumping straight to the "Yay! It works!" stage.
 
Upvote 0
A massive thank you to both of you guys for your help!
I honestly thought excel was mainly useless and only really for accountants, the last few days have really opened my eyes to it's potential power as a data mining tool!

The first solution works almost perfectly. When I tested it I realised that some of the hashtags are repeated because they have punctuation. E.g. some say #Excel while others might be #Excel! or #Excel,

Should be able to figure out how to stop the string once it encounters any non-alphanumeric characters though. :)

Thanks a million again! :) :) :)
 
Upvote 0
Glad we could help, welcome to MrExcel :)

Here is one possible option for removing the punctuation, you can add to the list of characters to remove by amending the red section.

Rich (BB code):
Sub ExtractAndCountHashTags()
Dim d As Object, v, w, i As Long, j As Long
Const Punct As String = ",;:!"
v = Range("A1:A" & Range("A" & Rows.Count).End(xlUp).Row).Value
Set d = CreateObject("Scripting.Dictionary")
d.CompareMode = vbTextCompare
For i = 1 To UBound(v)
    For j = 1 To Len(Punct)
        v(i, 1) = Replace(v(i, 1), Mid(Punct, j, 1), "")
    Next j
    For Each w In Split(v(i, 1), " ")
        If Left(w, 1) = "#" Then
            If d.Exists(w) Then
                d.Item(w) = d.Item(w) + 1
                    Else
                d.Add w, 1
            End If
        End If
    Next w
Next i
ThisWorkbook.Worksheets.Add
Range("A1").Resize(d.Count).Value = Application.Transpose(d.Keys)
Range("B1").Resize(d.Count).Value = Application.Transpose(d.Items)
End Sub
 
Upvote 0
Cool. I really need to read up on these macros and how they work, that was my first time using Visual Basic.
EXpect to see me here a lot more! :)

I got around it in another way though, it's not as neat but it does that job.
I am copying all the cells in Column E on sheet 1 to Column A on sheet 2.
Rather than just using "=Original!E2" I am using the below formula which converts to lowercase and also replaces punctuation with spaces.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE((LOWER(Original!E2)),"?"," "),"!"," "),"'"," "),"."," ")

Initially I replaced the punctuation with nothing but then "#Excel's" became "Excels" which was incorrect.
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,337
Members
452,637
Latest member
Ezio2866

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top