search documents

lazarandreiflorin

New Member
Joined
Dec 16, 2018
Messages
34
is there a way to search text documents ( pdf and docx) for a combination of words and highlight it using excel ? thank you
 

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.
You can open Word documents and manipulate them in Excel. I am not sure the same can be done in a .pdf file. since they are generally not editable. If their text can be exported to word (may not be possible depending on how .pdf was created) then that document could be edited.

The code at the end of this post looks at all .doc? files in a directory and copies the underlined text from them into Excel. Step through it and see how it works.

This is not what you asked for, but it shows you how to get access to the word document from Excel. If you open word, turn on the macro recorder and perform the search & highlight actions you want to have Excel do, you can modify that code and include it in the posted code to automate those functions.

Be sure to test your code on a copy of your files.

Code:
Sub ExtractUnderlinedTextFromWord()

    'If opening the document in Word requires you to answer a prompt (enable content or some such)
    'this code will hang.  You can kill the WINWORD process in the Task Manager
    
    Dim appWD As Object
    Dim sFilePathName As String
    Dim iAnswer As Integer
    Dim lNextWriteRow As Long
    Dim lDocIndex As Long
    Dim sFileNameExt As Variant
    Dim aryFilePathNameExt() As Variant
    Dim sFilePath As String
    Dim lDocCount As Long
    
    If Cells(Rows.Count, 1).End(xlUp).row > 1 Then
        iAnswer = MsgBox("There are data in column A.  Do you want to erase it?", vbYesNo, "Erase column A?")
        If iAnswer <> vbYes Then
            MsgBox "Process cancelled."
            GoTo End_Sub
        End If
        Columns(1).Cells.Clear
    End If
    
    sFilePath = GetFolder(ThisWorkbook.path)
    
    If sFilePath = "" Then
        MsgBox "No directory selected."
        GoTo End_Sub
    End If
    
    'Build array
    sFileNameExt = Dir(sFilePath & "\*.doc?")
    Do While sFileNameExt <> vbNullString
        lDocIndex = lDocIndex + 1
        ReDim Preserve aryFilePathNameExt(1 To lDocIndex)
        aryFilePathNameExt(lDocIndex) = sFilePath & "\" & sFileNameExt
        sFileNameExt = Dir
    Loop
    lDocCount = lDocIndex
    
    'Uncomment next paragraph if you want to see document opened
'    appWD.Visible = True
'    With appWD
'        .WindowState = 0 'wdWindowStateNormal
'        .Resize Width:=appWD.InchesToPoints(6), Height:=appWD.InchesToPoints(4)
'    End With

    For lDocIndex = LBound(aryFilePathNameExt) To UBound(aryFilePathNameExt)
        Application.StatusBar = lDocIndex & " / " & lDocCount & ": " & aryFilePathNameExt(lDocIndex)

        Set appWD = CreateObject("Word.Application")
        
        appWD.Documents.Open fileName:=aryFilePathNameExt(lDocIndex), _
            ConfirmConversions:=False, ReadOnly:=True, AddToRecentFiles:=False, _
            PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
            WritePasswordDocument:="", WritePasswordTemplate:="", Format:= _
            0, XMLTransform:=""  '0=wdOpenFormatAuto
        
        'Some code extracted from
        'http://stackoverflow.com/questions/13465709/repeating-microsoft-word-vba-until-no-search-results-found
        
        lNextWriteRow = lNextWriteRow + 1
        ActiveSheet.Cells(lNextWriteRow, 1).Value = aryFilePathNameExt(lDocIndex)
        'Start at the top of the document
        appWD.Selection.HomeKey Unit:=6  'wdStory
        appWD.Selection.ExtendMode = False
        'find a footnote to kick it off
        
        appWD.Selection.Find.ClearFormatting
        appWD.Selection.Find.Font.Underline = 1 'wdUnderlineSingle
        With appWD.Selection.Find
            .Text = "*"
            .Replacement.Text = ""
            .Forward = True
            .Wrap = 1 'wdFindContinue
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchAllWordForms = False
            .MatchSoundsLike = False
            .MatchWildcards = True
        End With
        appWD.Selection.Find.Execute
        
        Do While appWD.Selection.Find.Found = True
            
            'Jump back to the start of the document.
            appWD.Selection.Extend
            
            'On the last loop you'll not find a result so check here
            If appWD.Selection.Find.Found Then
    
                appWD.Selection.Extend
                
                Do While appWD.ActiveDocument.Characters(appWD.Selection.End).Font.Underline
                    appWD.Selection.MoveRight Unit:=1 ' wdCharacter
                Loop
                
                appWD.Selection.MoveLeft Unit:=1  'wdCharacter
                lNextWriteRow = lNextWriteRow + 1
                ActiveSheet.Cells(lNextWriteRow, 2).Value = appWD.Selection
    
                appWD.Selection.Collapse 0 'wdCollapseEnd
                
            End If
            appWD.Selection.ExtendMode = False
            appWD.Selection.Find.Execute
        Loop
        
        appWD.Quit
        
    Next
    
    
End_Sub:
    Application.ScreenUpdating = True
    Application.StatusBar = False
    Set appWD = Nothing
End Sub
 
Upvote 0
You can trial this...
Code:
Sub Test(SearchWords As String)
Dim PFWdApp As Object, oPara As Object

'open Word application
On Error Resume Next
Set PFWdApp = GetObject(, "word.application")
If Err.Number <> 0 Then
On Error GoTo 0
Set PFWdApp = CreateObject("Word.Application")
End If
PFWdApp.Visible = True
PFWdApp.Documents.Open Filename:="C:\testfolder\test.doc"
'check each para
For Each oPara In PFWdApp.ActiveDocument.Paragraphs
'if not blank para
If oPara.Range.Text <> Chr(13) Then
If InStr(oPara.Range.Text, SearchWord) Then
With oPara.Range.Find
    .Text = SearchWord
    .Forward = True
    .MatchWholeWord = True
    .Execute
If .found = True Then
.Parent.Font.ColorIndex = 6
End If
End With
End If
End If
Next oPara
Set oPara = Nothing
Set PFWdApp = Nothing
End Sub
To operate...
Code:
Call Test("Bob' your uncle")
HTH. Dave
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,820
Messages
6,181,161
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top