dscheste
Board Regular
- Joined
- Jul 10, 2004
- Messages
- 206
Hello, gurus
I have read the post here about an alphabetically presented list of frequently used words.
The post riveted my attention and I decided to play around with an idea of creating a code, which will analyse how many times a word was used in a text and then sort the list having the most frequent words on top of the list.
As the text might be large I dismissed the idea of doing it in Excel and had to go to the "bastard child" ™ by Tom Urtis - Microsoft Word.
The code works like this:
1. Copy any text and paste it as unformatted text in to the document (in this case named Frequent_words.doc).
2. Run the macro.
3. Macro deletes the words it has already counted and then tells how many words were processed and how long it took.
4. The stats are in the Stats.doc, which is being saved by default in root.
I would like to ask you to comment on the code, any ideas how to make it run faster, shorter code ideas?
Appreciate your feedback
Here is the code: (VBE in Word)
<font face=Courier New><SPAN style="color:#00007F">Sub</SPAN> Frequent_words()
<SPAN style="color:#00007F">Dim</SPAN> MyRange, WordOrder, Word, Frequency, Statistics, stimer, ftimer, Text, cellWordCount, prevcharacter, lastspace, i, nextCharacter
stimer = Timer
<SPAN style="color:#00007F">With</SPAN> Application
.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN> <SPAN style="color:#007F00">' we do not wanna see what is happening</SPAN>
<SPAN style="color:#00007F">With</SPAN> Selection <SPAN style="color:#007F00">'We have to move to the end of the document and add a space in order not to loose the last word when parsing</SPAN>
.EndKey unit:=wdStory <SPAN style="color:#007F00">'Goes to the end</SPAN>
.InsertAfter (" ") <SPAN style="color:#007F00">'inserts a space</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
<SPAN style="color:#00007F">Set</SPAN> MyRange = ActiveDocument.Content
<SPAN style="color:#00007F">With</SPAN> MyRange.Find <SPAN style="color:#007F00">' we have to pre-format the text and get rid of the paragraphs</SPAN>
.Text = "^p"
.Replacement.Text = " "
.Forward = <SPAN style="color:#00007F">True</SPAN>
.Wrap = wdFindContinue
.Format = <SPAN style="color:#00007F">False</SPAN>
.MatchCase = <SPAN style="color:#00007F">False</SPAN>
.MatchWholeWord = <SPAN style="color:#00007F">False</SPAN>
.MatchWildcards = <SPAN style="color:#00007F">False</SPAN>
.MatchSoundsLike = <SPAN style="color:#00007F">False</SPAN>
.MatchAllWordForms = <SPAN style="color:#00007F">False</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Selection.Find.Execute Replace:=wdReplaceAll
Text = Trim(MyRange) <SPAN style="color:#007F00">'Now we trim the text a bit to work with nice text</SPAN>
cellWordCount = 1
prevcharacter = ""
lastspace = 1
Documents.Add DocumentType:=wdNewBlankDocument <SPAN style="color:#007F00">' Here we would record our findings</SPAN>
ChangeFileOpenDirectory "C:\" <SPAN style="color:#007F00">'This path can be changed, when changing the name rename the window as well</SPAN>
ActiveDocument.SaveAs Filename:="Stats.doc", FileFormat:= _
wdFormatDocument, LockComments:=False, Password:="", AddToRecentFiles:= _
<SPAN style="color:#00007F">True</SPAN>, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:= _
<SPAN style="color:#00007F">False</SPAN>, SaveNativePicture<SPAN style="color:#00007F">For</SPAN>mat:=False, SaveFormsData:=<SPAN style="color:#00007F">False</SPAN>, _
SaveAsAOCELetter:=False
<SPAN style="color:#00007F">Set</SPAN> Statistics = Windows("Stats")
<SPAN style="color:#007F00">'Get back to our sheep</SPAN>
Windows("Frequent_Words").Activate
For i = 1 <SPAN style="color:#00007F">To</SPAN> Len(Text)
<SPAN style="color:#007F00">' we will have to loop through the text and analyze every character, a space will mean new word is following</SPAN>
nextCharacter = Mid(Text, i, 1)
<SPAN style="color:#00007F">If</SPAN> (nextCharacter = " ") And (prevcharacter <> " ") <SPAN style="color:#00007F">Then</SPAN>
Word = Trim(Mid(Text, lastspace, i - lastspace)) <SPAN style="color:#007F00">' that is our actual word</SPAN>
lastspace = i
<SPAN style="color:#00007F">With</SPAN> Active<SPAN style="color:#00007F">Do</SPAN>cument.Content.Find
.ClearFormatting
Do <SPAN style="color:#00007F">While</SPAN> .Execute(FindText:=Word, Forward:=True, Format:=True, ReplaceWith:="", MatchWholeWord:=<SPAN style="color:#00007F">True</SPAN>) = True
Frequency = Frequency + 1
<SPAN style="color:#007F00">' we have found all the instances of the word and now mark each time the loop revolves</SPAN>
<SPAN style="color:#00007F">Loop</SPAN>
<SPAN style="color:#00007F">With</SPAN> Statistics.Selection
.InsertAfter (Frequency)
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (" ")
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (Word)
.MoveDown unit:=wdLine, Count:=1
.InsertParagraph
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Frequency = 1
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
cellWordCount = cellWordCount + 1 <SPAN style="color:#007F00">' count the actual number of words</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
prevcharacter = nextCharacter <SPAN style="color:#007F00">' proceed to the next character</SPAN>
<SPAN style="color:#00007F">Next</SPAN> i
<SPAN style="color:#00007F">If</SPAN> Len(Text) = 0 <SPAN style="color:#00007F">Then</SPAN>
cellWordCount = 0
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
<SPAN style="color:#007F00">'Now we sort the results descending</SPAN>
Statistics.Selection.WholeStory
Statistics.Selection.Sort ExcludeHeader:=False, FieldNumber:="Field1", _
SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderDescending, Separator:= _
wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _
:=wdEnglishUS
Statistics.Selection.HomeKey unit:=wdStory
.ScreenUpdating = True
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
ftimer = Timer
<SPAN style="color:#007F00">'A bit of show-off</SPAN>
MsgBox Str(cellWordCount) & " words processed in " & ftimer - stimer & " seconds."
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN>
</FONT>
P.S. The code works for every alphanumerical language. Just tried in Russian and it worked
I have read the post here about an alphabetically presented list of frequently used words.
The post riveted my attention and I decided to play around with an idea of creating a code, which will analyse how many times a word was used in a text and then sort the list having the most frequent words on top of the list.
As the text might be large I dismissed the idea of doing it in Excel and had to go to the "bastard child" ™ by Tom Urtis - Microsoft Word.
The code works like this:
1. Copy any text and paste it as unformatted text in to the document (in this case named Frequent_words.doc).
2. Run the macro.
3. Macro deletes the words it has already counted and then tells how many words were processed and how long it took.
4. The stats are in the Stats.doc, which is being saved by default in root.
I would like to ask you to comment on the code, any ideas how to make it run faster, shorter code ideas?
Appreciate your feedback
Here is the code: (VBE in Word)
<font face=Courier New><SPAN style="color:#00007F">Sub</SPAN> Frequent_words()
<SPAN style="color:#00007F">Dim</SPAN> MyRange, WordOrder, Word, Frequency, Statistics, stimer, ftimer, Text, cellWordCount, prevcharacter, lastspace, i, nextCharacter
stimer = Timer
<SPAN style="color:#00007F">With</SPAN> Application
.ScreenUpdating = <SPAN style="color:#00007F">False</SPAN> <SPAN style="color:#007F00">' we do not wanna see what is happening</SPAN>
<SPAN style="color:#00007F">With</SPAN> Selection <SPAN style="color:#007F00">'We have to move to the end of the document and add a space in order not to loose the last word when parsing</SPAN>
.EndKey unit:=wdStory <SPAN style="color:#007F00">'Goes to the end</SPAN>
.InsertAfter (" ") <SPAN style="color:#007F00">'inserts a space</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
<SPAN style="color:#00007F">Set</SPAN> MyRange = ActiveDocument.Content
<SPAN style="color:#00007F">With</SPAN> MyRange.Find <SPAN style="color:#007F00">' we have to pre-format the text and get rid of the paragraphs</SPAN>
.Text = "^p"
.Replacement.Text = " "
.Forward = <SPAN style="color:#00007F">True</SPAN>
.Wrap = wdFindContinue
.Format = <SPAN style="color:#00007F">False</SPAN>
.MatchCase = <SPAN style="color:#00007F">False</SPAN>
.MatchWholeWord = <SPAN style="color:#00007F">False</SPAN>
.MatchWildcards = <SPAN style="color:#00007F">False</SPAN>
.MatchSoundsLike = <SPAN style="color:#00007F">False</SPAN>
.MatchAllWordForms = <SPAN style="color:#00007F">False</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Selection.Find.Execute Replace:=wdReplaceAll
Text = Trim(MyRange) <SPAN style="color:#007F00">'Now we trim the text a bit to work with nice text</SPAN>
cellWordCount = 1
prevcharacter = ""
lastspace = 1
Documents.Add DocumentType:=wdNewBlankDocument <SPAN style="color:#007F00">' Here we would record our findings</SPAN>
ChangeFileOpenDirectory "C:\" <SPAN style="color:#007F00">'This path can be changed, when changing the name rename the window as well</SPAN>
ActiveDocument.SaveAs Filename:="Stats.doc", FileFormat:= _
wdFormatDocument, LockComments:=False, Password:="", AddToRecentFiles:= _
<SPAN style="color:#00007F">True</SPAN>, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:= _
<SPAN style="color:#00007F">False</SPAN>, SaveNativePicture<SPAN style="color:#00007F">For</SPAN>mat:=False, SaveFormsData:=<SPAN style="color:#00007F">False</SPAN>, _
SaveAsAOCELetter:=False
<SPAN style="color:#00007F">Set</SPAN> Statistics = Windows("Stats")
<SPAN style="color:#007F00">'Get back to our sheep</SPAN>
Windows("Frequent_Words").Activate
For i = 1 <SPAN style="color:#00007F">To</SPAN> Len(Text)
<SPAN style="color:#007F00">' we will have to loop through the text and analyze every character, a space will mean new word is following</SPAN>
nextCharacter = Mid(Text, i, 1)
<SPAN style="color:#00007F">If</SPAN> (nextCharacter = " ") And (prevcharacter <> " ") <SPAN style="color:#00007F">Then</SPAN>
Word = Trim(Mid(Text, lastspace, i - lastspace)) <SPAN style="color:#007F00">' that is our actual word</SPAN>
lastspace = i
<SPAN style="color:#00007F">With</SPAN> Active<SPAN style="color:#00007F">Do</SPAN>cument.Content.Find
.ClearFormatting
Do <SPAN style="color:#00007F">While</SPAN> .Execute(FindText:=Word, Forward:=True, Format:=True, ReplaceWith:="", MatchWholeWord:=<SPAN style="color:#00007F">True</SPAN>) = True
Frequency = Frequency + 1
<SPAN style="color:#007F00">' we have found all the instances of the word and now mark each time the loop revolves</SPAN>
<SPAN style="color:#00007F">Loop</SPAN>
<SPAN style="color:#00007F">With</SPAN> Statistics.Selection
.InsertAfter (Frequency)
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (" ")
.MoveRight unit:=wdCharacter, Count:=1
.InsertAfter (Word)
.MoveDown unit:=wdLine, Count:=1
.InsertParagraph
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
Frequency = 1
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
cellWordCount = cellWordCount + 1 <SPAN style="color:#007F00">' count the actual number of words</SPAN>
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
prevcharacter = nextCharacter <SPAN style="color:#007F00">' proceed to the next character</SPAN>
<SPAN style="color:#00007F">Next</SPAN> i
<SPAN style="color:#00007F">If</SPAN> Len(Text) = 0 <SPAN style="color:#00007F">Then</SPAN>
cellWordCount = 0
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">If</SPAN>
<SPAN style="color:#007F00">'Now we sort the results descending</SPAN>
Statistics.Selection.WholeStory
Statistics.Selection.Sort ExcludeHeader:=False, FieldNumber:="Field1", _
SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderDescending, Separator:= _
wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _
:=wdEnglishUS
Statistics.Selection.HomeKey unit:=wdStory
.ScreenUpdating = True
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">With</SPAN>
ftimer = Timer
<SPAN style="color:#007F00">'A bit of show-off</SPAN>
MsgBox Str(cellWordCount) & " words processed in " & ftimer - stimer & " seconds."
<SPAN style="color:#00007F">End</SPAN> <SPAN style="color:#00007F">Sub</SPAN>
</FONT>
P.S. The code works for every alphanumerical language. Just tried in Russian and it worked