Word 2003 - VBA to find specific terms

Spurious

Active Member
Joined
Dec 14, 2010
Messages
439
Hi all,

I am looking to code a macro that goes through a text and finds every word that starts with a capital letter.
Those words represent defined terms.

The problem here is, that sometimes two or more words are capitalized and they combined are the defined term.

E.g. Maturity Date would be an example.

So, I need a macro that goes through a text and finds all of those terms. Hickups include starts of sentences, punctuation in between and different number of words building a defined term.


I am not sure, if I was clear with what I wanted to code. Please ask questions, if something is unclear.

Thanks!
 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
So how is the macro supposed to be able to tell whether the first word of a sentence, etc. is or is not part of a 'defined term'? And what about names & honorifics - are they 'defined terms'?

Ultimately, to get reliable results, you need to have a means of differentiating your defined terms from other text. This could be by having a list of such terms in another file, or identifying features such as doubles quotes, bold text or particular Style names that can be used to identify them.

And of course, once you've got that sorted, what do you want done with them? For some code ideas see: http://social.technet.microsoft.com/Forums/en-US/word/thread/228d49ed-53a4-487f-9829-316f76abbe13
 
Upvote 0
Yeah, I shouldnt have mentioned the problems part, because at the moment, I am looking for a way to get every word that starts with a capital letter.

In the second step, I am comparing them to an index I've created with all the defined terms (basically what the linked article tries to do, have I already done).
I am now looking for a way to find defined terms in the text, which are not yet indexed.
 
Last edited:
Upvote 0
Hi Spurious,

Try:
Code:
Sub GetTerms()
Dim Rng As Range, i As Long, StrTxt As String
StrTxt = Chr(11)
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Text = "<[A-Z]*>"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = True
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    .Execute
  End With
  Do While .Find.Found
    Set Rng = .Duplicate
    While Rng.Words.Last.Next.Characters.First Like "[A-Z]"
      Rng.MoveEnd wdWord, 1
    Wend
    If InStr(StrTxt, Chr(11) & Trim(Rng.Text) & Chr(11)) = 0 Then
      StrTxt = StrTxt & Trim(Rng.Text) & Chr(11)
      i = i + 1
    End If
    .Start = Rng.End
    .Find.Execute
  Loop
  StrTxt = Left(StrTxt, Len(StrTxt) - 1)
End With
  If Len(StrTxt) > 1 Then
    ActiveDocument.Range.InsertAfter vbCr & Chr(12) & "Possible Defined Terms" & StrTxt
  End If
MsgBox i & " possible 'Defined Term' expressions found."
End Sub
 
Upvote 0
Thank you very much!
Does exactly what I want.

Now, I got a further problem, because some terms have "(*)" directly afterwards, e.g.
Maturity(i) Date or Maturity(i,t) and other things in parenthesis. If there is no space between the word and the them, they should be counted as part of the term as well (if there is a space, they shouldnt!).

Is it possible to ammend them in this part of the code?

Code:
While Rng.Words.Last.Next.Characters.First Like "[A-Z]"

So basically, if like [A-Z] then do what it does now, if like ( then select everything until ) and go further from there.


Thanks again for you help and thanks in advance.
 
Upvote 0
You could do that by changing the Do While ... Loop to:
Code:
  Do While .Find.Found
    Set Rng = .Duplicate
    With Rng
      While .Words.Last.Next.Characters.First Like "[(A-Z]"
        .MoveEnd wdWord, 1
      Wend
      If InStr(.Text, "(") Then
        .MoveEndUntil ")", wdForward
        .End = Rng.End + 1
      End If
      If InStr(StrTxt, Chr(11) & Trim(.Text) & Chr(11)) = 0 Then
        StrTxt = StrTxt & Trim(.Text) & Chr(11)
        i = i + 1
      End If
    End With
    .Start = Rng.End
    .Find.Execute
  Loop
 
Upvote 0
Thanks that works as a start, but it now has the problem, that it ignores spaces between the defined term and the parenthesis.

E.g.
Maturity(i) Date should be a defined term.
Maturity (Day) should not be a defined term.
 
Upvote 0
In that case, change:
While .Words.Last.Next.Characters.First Like "[(A-Z]"
to:
While .Words.Last.Next.Characters.First Like "[A-Z]"
and change:
If InStr(.Text, "(") Then
to:
If .Characters.Last.Next = "(" Then
 
Upvote 0
Unfortunately, the problem still exists.

I tried:
Code:
If Not .Characters.Last.Next = " (" And .Characters.Last.Next = "(" Then

but it still ignores the spaces.
 
Upvote 0
That suggests you didn't make the first of the last two changes I suggested. It certainly works in my testing.

In any event, please bear in mind the code is only meant to identify possible terms, not to provide an exact list.
 
Upvote 0

Forum statistics

Threads
1,225,616
Messages
6,186,016
Members
453,334
Latest member
Prakash Jha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top