MSWord Delete duplicate words in selected text

mole999

Well-known Member
Joined
Oct 23, 2004
Messages
10,524
Office Version
  1. 2019
  2. 2016
  3. 2013
Platform
  1. Windows
As per the title, I'm looking for a bit of code so that selected text will have all duplicate words removed ONLY in the selection, effectively left with a word cloud, even better would to be have an extensible static list in the VBA to delete whole words, and any word less than three characters so (the, if, as) would all go as well, icing on the cake, then sort the words alphabetically so similar words can be identified quickly

not asking much am I, nothing I have seen on the net gets close

ONLY SELECTED
LESS THAN FOUR CHARACTERS
EXCLUDED WORDS
SORTED
 
Thank you very much, I would never have gotten there
 
Upvote 0

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"
I'm sure you'd have gotten there in the end - even if not by quite the same route.

I've found a small glitch - the code deletes email & web addresses, but leaves behind the @ from the email addresses. You can delete those by changing:
40 To 63
to:
40 To 64
If you need, I could do some more work to retain the email & web addresses.
 
Upvote 0
No, it was purely a way of sifting large amounts of text to get the key words to support another piece of work, whilst out of context, I can now see groups of similar words, and make a decision to include or exclude. Emails wouldn't be an issue, I just grabbed some free text off the web as a demonstration
 
Upvote 0
FWIW, in case anyone else needs to work with a document containing email and/or hyperlink addresses, the following version extracts those too (but not the hyperlink display text).

Code:
Sub ListBuilder()
Application.ScreenUpdating = False
Dim StrIn As String, StrTmp As String, StrExcl As String, StrOut() As String
Dim i As Long, j As Long
'Define the exlusions list
StrExcl = "A,Am,An,And,Are,As,At,B,Be,But,By,C,Can,Cm,D,Did," & _
          "Do,Does,E,Eg,En,Eq,Etc,F,For,G,Get,Go,Got,H,Has,Have," & _
          "He,Her,Him,How,I,Ie,If,In,Into,Is,It,Its,J,K,L,M,Me," & _
          "Mi,Mm,My,N,Na,Nb,No,Not,O,Of,Off,Ok,On,One,Or,Our,Out," & _
          "P,Q,R,Re,S,She,So,T,The,Their,Them,They,This,T,To,U,V," & _
          "Via,Vs,W,Was,We,Were,Who,Will,With,Would,X,Y,Yd,You,Your,Z"
With ActiveDocument.Range
  'Convert email & web addresses to hyperlinks, then capture & delete
  .AutoFormat
  While .Hyperlinks.Count > 0
    With .Hyperlinks(1)
      StrTmp = StrTmp & " " & .Address
      .Delete
    End With
  Wend
  'Delete words of 3 characters or less
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "<[! ]{1,3}>"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
  End With
  'ActiveWindow.View.ShowFieldCodes = False
  'Capitalise the first letter of each word
  .Case = wdTitleWord
  'Get the document's text
  StrIn = .Text
  'Strip out unwanted characters. Amongst others, hyphens and formatted single quotes are retained at this stage
  For i = 1 To 255
    Select Case i
      'To strip out unwanted characters
      Case 1 To 38, 40 To 64, 91 To 96, 123 To 144, 147 To 149, 152 To 171, 174 To 191, 247
      StrIn = Replace(StrIn, Chr(i), " ")
    End Select
  Next
  'Convert smart single quotes to plain single quotes & delete any at the start/end of a word
  StrIn = Replace(Replace(Replace(Replace(StrIn, Chr(145), "'"), Chr(146), "'"), "' ", " "), " '", " ")
  'Process the exclusions list
  For i = 0 To UBound(Split(StrExcl, ","))
    While InStr(StrIn, " " & Split(StrExcl, ",")(i) & " ") > 0
      StrIn = Replace(StrIn, " " & Split(StrExcl, ",")(i) & " ", " ")
    Wend
  Next
  'Incorporate email & web addresses
  StrIn = StrIn & StrTmp
  'Clean up any duplicate spaces
  While InStr(StrIn, "  ") > 0
    StrIn = Replace(StrIn, "  ", " ")
  Wend
  StrIn = " " & Trim(StrIn) & " "
  j = UBound(Split(StrIn, " "))
  For i = 1 To j
    StrTmp = Split(StrIn, " ")(1)
    While InStr(StrIn, " " & StrTmp & " ") > 0
      StrIn = Replace(StrIn, " " & StrTmp & " ", " ")
    Wend
    'Update the output array
    ReDim Preserve StrOut(i - 1)
    StrOut(i - 1) = StrTmp
    If UBound(Split(StrIn, " ")) = 1 Then Exit For
  Next
  WordBasic.SortArray StrOut()
  .Text = Join(StrOut, ", ")
  'Restore email & web address hyperlinks
  .AutoFormat
  .Style = wdStyleNormal
End With
Application.ScreenUpdating = True
End Sub
 
Upvote 0

Forum statistics

Threads
1,225,768
Messages
6,186,924
Members
453,387
Latest member
uzairkhan

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top