Count number of keywords (range) found in a text string

liiamra

Board Regular
Joined
Mar 31, 2013
Messages
50
Dear All,

I am trying to count the number of specific words contained in a specific cell. If my data were static, and the list of keywords was short, there are many solutions such as using multiple instances of the length function as proposed by @shg here.

@njimack has proposed a very easy and helpful solution here, but it serves for listing the words and not counting them.

Suppose I have the following sheet: (column C is what I am trying to achieve)

[TABLE="width: 500"]
<tbody>[TR]
[TD][/TD]
[TD]A[/TD]
[TD]B[/TD]
[TD]C[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Keywords[/TD]
[TD]Comments[/TD]
[TD]Count[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]happy[/TD]
[TD]Life is beautiful, you just need to stay happy and calm. staying happy is important.[/TD]
[TD]3[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]stay[/TD]
[TD]no matter what you do, don't get upset![/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]smile[/TD]
[TD]smile in the morning, smile in the evening, smile always[/TD]
[TD]3[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]beautiful[/TD]
[TD]hello world, how is everyone doing? I am happy[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]upset[/TD]
[TD]don't be upset, cheer up.[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]hello[/TD]
[TD]smile smile, smile. smile, smile[/TD]
[TD]5[/TD]
[/TR]
</tbody>[/TABLE]

That is, C2 will tell that B2 contained 3 of the keywords (1X beautiful, 2X happy) and so on.

Any recommendations are highly appreciated.

Thanks a lot
 
The code seems to work perfectly with no errors. The time it takes to execute, however, is lengthy (and Excel becomes unresponsive during the process). When I alter the sheet with any kind, it seems that the module refreshes itself and again, Excel becomes unresponsive for a short duration- after which it comes back.
I just wonder if there is a way to make this great function more efficient...?
More than likely (not sure though), PGC's code will be faster as your question seems to be made for what a Regular Expression parser does, however, the following reworking of my code should make it much, much quicker than before... I would be interested in knowing how close it comes to the speed of PGC's code when applied against your actual data (if you don't mind testing it for me, that is).

Code:
Function WordListCount(TextToSearch As String, WordList As Range) As Long
  Dim R As Long, C As Long, X As Long, StartAt As Long
  Dim WL As Variant, UCtext As String, UCword As String
  WL = WordList
  UCtext = " " & Ucase(TextToSearch) & " "
  For R = 1 To UBound(WL)
    For C = 1 To UBound(WL, 2)
      UCword = Ucase(WL(R, C))
      If InStr(UCtext, UCword) Then
        StartAt = 2
        Do While StartAt
          If Mid(UCtext, StartAt - 1, Len(WL(R, C)) + 2) Like "[!0-9A-Z]" & UCword & "[!0-9A-Z]" Then
            WordListCount = WordListCount + 1
          End If
          StartAt = InStr(StartAt + 1, UCtext, UCword)
        Loop
      End If
    Next
  Next
End Function
 
Last edited:
Upvote 0

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
I would be interested in knowing how close it comes to the speed of PGC's code when applied against your actual data (if you don't mind testing it for me, that is).


Dear Rick,

For maximum clarity, I have run both codes on the same machine (at work) with the same set of data. I got to tell you Rick, your code is extremely faster. I don't want to appear like I am analyzing a race, but the following technical details could be useful.

Test:

Machine: Quad-Core, 64GB Ram. Excel v: 2013 (I don't understand why many functions just don't work with it while working perfectly with 2007)
Data: average no of chars in a string: 3,000; no of strings (cells): 3500. No of keywords located in another sheet in the same workbook: 2,200

Results: (execution speed in minutes)
Rick's code: 01.10.3
PGC's code: 06.46.1

The good part is that both produce the same result, so I will definitely go with your code as it is considerably faster. Many thanks for your time Rick, I really appreciate it.

As a last word to this thread, I would like to truly thank everyone who has contributed to this post. Your efforts have helped me, and it will definitely help many-many other users in the future. I appreciate every single minute you have spent.

Thanks again and again, and all the best//
 
Upvote 0
Hi liiamra

Thank you for posting the results of your tests. This data is helpful.

If you still have your test setup available I'd be curious to see how this modification of the code I post performs.

If you can't do it, no problem.


Code:
Function KeywordCount(sTextToSearch As String, rKeywords As Range) As Long
Static regex As Object

If regex Is Nothing Then
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"
    regex.Global = True
    regex.IgnoreCase = True
End If

KeywordCount = regex.Execute(sTextToSearch).Count
End Function
 
Upvote 0
Hi liiamra

Thank you for posting the results of your tests. This data is helpful.

If you still have your test setup available I'd be curious to see how this modification of the code I post performs.

If you can't do it, no problem.

Code:
Function KeywordCount(sTextToSearch As String, rKeywords As Range) As Long
Static regex As Object

If regex Is Nothing Then
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"
    regex.Global = True
    regex.IgnoreCase = True
End If

KeywordCount = regex.Execute(sTextToSearch).Count
End Function
I would also be interested in seeing if holding the reference to the RegExp object speeds things up and by how much. I will admit to being greatly surprised by how much faster my new code was than yours... I would have sworn the RegExp would have still beat it. Now, about the code you want liiamra to test... is that written the way you actually wanted it? I would have thought is should be written this way...

Code:
Function KeywordCount(sTextToSearch As String, rKeywords As Range) As Long
  Static regex As Object

  If regex Is Nothing Then Set regex = CreateObject("VBScript.RegExp")

  regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"
  regex.Global = True
  regex.IgnoreCase = True

  KeywordCount = regex.Execute(sTextToSearch).Count
End Function
 
Upvote 0
I would also be interested in seeing if holding the reference to the RegExp object speeds things up and by how much. I will admit to being greatly surprised by how much faster my new code was than yours... I would have sworn the RegExp would have still beat it. Now, about the code you want liiamra to test... is that written the way you actually wanted it? I would have thought is should be written this way...

Code:
Function KeywordCount(sTextToSearch As String, rKeywords As Range) As Long
  Static regex As Object

  If regex Is Nothing Then Set regex = CreateObject("VBScript.RegExp")

  regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"
  regex.Global = True
  regex.IgnoreCase = True

  KeywordCount = regex.Execute(sTextToSearch).Count
End Function

Hi Rick

I must say that am not that surprised that your code is faster than mine.

I love regex's for their flexibility and for how quick you can some code (not always!).
In all honesty it took me a minute to write the solution for this problem and testing it was also very quick and easy.
... but I know they are heavy, and that usually a solution not using regex's is faster.

As for your observation on what I want to repeat I definitely want to keep 2 of the statements:

Repeating these statements each time the function runs:

Code:
    regex.Global = True
    regex.IgnoreCase = True

would be really inefficient, would just be spending time overwriting the values that are already the values of the properties of the object.

The other one, however,

Code:
    regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"

You are right, it should be out, if the second parameter changes I should build a new pattern.

Thank you for the correction.
 
Upvote 0
pgc01;3556586As for your observation on what I want to repeat I definitely want to write the code the way I did. Repeating these statements each time the function runs: [CODE said:
regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"
regex.Global = True
regex.IgnoreCase = True
[/CODE]

would be really inefficient, would just be spending time overwriting the values that are already the values of the properties of the object.
Ah, I see... that is the stuff that is not changing, so why recalculate it. That should work so long as the user does not add or remove any keywords, nor make any changes to any of the cells, in the rKeywords range, otherwise the Pattern would not change with them, correct?
 
Upvote 0
Hi liiamra

This means that if you want to test my new version of the code, it should be:

Code:
Function KeywordCount(sTextToSearch As String, rKeywords As Range) As Long
Static regex As Object

If regex Is Nothing Then
    Set regex = CreateObject("VBScript.RegExp")
    regex.Global = True
    regex.IgnoreCase = True
End If

regex.Pattern = "\b(" & Join(Application.Transpose(rKeywords), "|") & ")\b"

KeywordCount = regex.Execute(sTextToSearch).Count
End Function
 
Upvote 0
Ah, I see... that is the stuff that is not changing, so why recalculate it. That should work so long as the user does not add or remove any keywords, nor make any changes to any of the cells, in the rKeywords range, otherwise the Pattern would not change with them, correct?

Rick

Sorry, I think you posted while I was correcting my post, when I understood your point.

You are absolutely right, as you can see my last post already addresses your observation.

Thank you.
 
Upvote 0
Hi liiamra

Thank you for posting the results of your tests. This data is helpful.

If you still have your test setup available I'd be curious to see how this modification of the code I post performs.

If you can't do it, no problem.

Hi PGC,

That is the least I can do after your great help. Actually, that is one of the reasons I came to the office today.

I have run the new code on the same machine, with the same data set as described before. What I can't understand is that the new code, with completely +/- approach is taking the same time as the old one. Here, I am referring to the last modified regex. Afterwards, I wanted to test your initial modified code, with the regex.Pattern inside the if statement, and to my surprise, it was faster than new one by few seconds. Below are the results. I have run each code on its own without any other processing in the background.


[TABLE="class: tableizer-table"]
<tbody>[TR="class: tableizer-firstrow"]
[TH][/TH]
[TH]Rick[/TH]
[TH]PGC*[/TH]
[TH]PGC**[/TH]
[/TR]
[TR]
[TD]1st Test[/TD]
[TD]1.10.3[/TD]
[TD]6.48.8[/TD]
[TD]6.41.9[/TD]
[/TR]
[TR]
[TD]2nd Test[/TD]
[TD]1.13.7[/TD]
[TD]6.46.6[/TD]
[TD]6.39.9[/TD]
[/TR]
[TR]
[TD]3rd Test[/TD]
[TD]1.10.2[/TD]
[TD]6.48.7[/TD]
[TD]6.40.1[/TD]
[/TR]
[TR]
[TD]Average[/TD]
[TD]1.11.4[/TD]
[TD]6.48.0[/TD]
[TD]6.40.6[/TD]
[/TR]
[TR]
[TD]*Last code[/TD]
[TD]
[/TD]
[TD]
[/TD]
[TD]
[/TD]
[/TR]
[TR]
[TD]*Initial 2nd, with the Pattern inside the if statement.[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

Please let me know if you need any other information. Besides, the last few exchanges between you and Rick can serve as a good VBA lesson.

Hope this helps, thanks and all the very best//





<style type="text/css"> table.tableizer-table { border: 1px solid #CCC; font-family: Arial, Helvetica, sans-serif font-size: 12px;} .tableizer-table td { padding: 4px; margin: 3px; border: 1px solid #ccc;}.tableizer-table th { background-color: #104E8B; color: #FFF; font-weight: bold;}</style><style type="text/css"> table.tableizer-table {
border: 1px solid #CCC; font-family: Arial, Helvetica, sans-serif
font-size: 12px;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #ccc;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
 
Last edited:
Upvote 0
Hi liiamra

Thank you for posting these results.

It's very interesting to see how they compare.

Thank you again.
 
Upvote 0

Forum statistics

Threads
1,223,270
Messages
6,171,102
Members
452,379
Latest member
IainTru

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top