Extract Data from specific places in multiple word files to excel

youbitto

Board Regular
Joined
Jun 8, 2022
Messages
51
Office Version
  1. 2019
  2. 2016
  3. 2013
Platform
  1. Windows
Hello MrExcel Genuises

I have Multiple word Files that has some data that I need to extract to excel

the documents has over 40 pages and the data I need are in separated pages

I am Asking if there is a way or macro in excel to that work automaticaly

here an example in word :

1742129786235.png


what i want to be filled in excel

Nouveau Feuille de calcul Microsoft Excel.xlsx
ABCD
1NameAgeBirth Adresse
2
3
4
Feuil2



Thank you
 
I have found this code but it inputs all the document

but it may be better if it searches for specific word and copy the text after it

like search for the word Name and copy the text after it

there is a formula added in the Name Manager "=OFFSET(Copyfromword!$A$1;COUNTA(Copyfromword!$A:$A);0;1;1)"

VBA Code:
Sub Copy_Data_From_Multiple_WordFiles2()

Dim FolderName As String
Dim FileName As String
Dim NewWordFile As New Word.Application
Dim NewDoc As New Word.Document

Application.DisplayAlerts = False
'Application.ScreenUpdating = False

FolderName = "C:\testt\"
FileName = Dir(FolderName)

'Loop start

Do While FileName <> ""
   
    Set NewDoc = NewWordFile.Documents.Open(FolderName & FileName, , True)
  
    NewDoc.Range(0, NewDoc.Range.End).Copy
    NewDoc.Range("LastRow").Copy

    

    NewDoc.Close SaveChanges:=wdDoNotSaveChanges
    NewWordFile.Quit
         
FileName = Dir()

Loop

End Sub
 
Upvote 0
Please try,
VBA Code:
Sub ExtractText()
    Dim a1 As Object, a2 As Object, a3 As String, a4 As Worksheet, a5 As Long
    Dim a6 As String, a7 As Variant, a8 As Variant, a9 As Variant, a10 As Variant
    Dim a11 As Boolean, a12 As Integer
    
   With Application.FileDialog(3)
    .Title = Chr(83) & Chr(101) & Chr(108) & Chr(101) & Chr(99) & Chr(116) & Chr(32) & _
             Chr(70) & Chr(105) & Chr(108) & Chr(101)
    
    .Filters.Clear
    .Filters.Add Chr(87) & Chr(111) & Chr(114) & Chr(100) & Chr(32) & _
                 Chr(68) & Chr(111) & Chr(99) & Chr(117) & Chr(109) & Chr(101) & _
                 Chr(110) & Chr(116) & Chr(115), "*.doc; *.docx"

    If .Show = -1 Then
        a6 = .SelectedItems(1)
    Else
        MsgBox Chr(78) & Chr(111) & Chr(32) & Chr(70) & Chr(105) & Chr(108) & Chr(101) & _
               Chr(32) & Chr(83) & Chr(101) & Chr(108) & Chr(101) & Chr(99) & _
               Chr(116) & Chr(101) & Chr(100), 48
        Exit Sub
    End If
End With

    On Error Resume Next
    Set a1 = GetObject(, "Word.Application")
    If Err.Number <> 0 Then
        Set a1 = CreateObject("Word.Application")
        a11 = True
    End If
    Err.Clear
    On Error GoTo 0

    Set a2 = a1.Documents.Open(a6, ReadOnly:=True)

    a3 = a2.Content.text

    a2.Close False
    Set a2 = Nothing

    If a11 Then a1.Quit
    Set a1 = Nothing

    a3 = Replace(a3, Chr(9), Chr(32))
    a3 = Replace(a3, Chr(160), Chr(32))
    a3 = Replace(a3, Chr(10), Chr(32))

    a7 = A1B2C3(a3, "Name:")
    a8 = A1B2C3(a3, "Age:")
    a9 = A1B2C3(a3, "Birth:")
    a10 = A1B2C3(a3, "Address:")

    Set a4 = ThisWorkbook.Sheets(Chr(70) & Chr(101) & Chr(117) & Chr(105) & Chr(108) & Chr(50))

    a5 = a4.Cells(a4.Rows.count, "A").End(-4162).Row + 1

    Dim maxE As Integer
    maxE = Application.WorksheetFunction.Max(UBound(a7), UBound(a8), UBound(a9), UBound(a10))

    For a12 = 0 To maxE
        a4.Cells(a5 + a12, 1).Value = IIf(a12 <= UBound(a7), a7(a12), "N/A")
        a4.Cells(a5 + a12, 2).Value = IIf(a12 <= UBound(a8), a8(a12), "N/A")
        a4.Cells(a5 + a12, 3).Value = IIf(a12 <= UBound(a9), a9(a12), "N/A")
        a4.Cells(a5 + a12, 4).Value = IIf(a12 <= UBound(a10), a10(a12), "N/A")
    Next a12

    MsgBox Chr(83) & Chr(117) & Chr(99) & Chr(99) & Chr(101) & Chr(115) & Chr(115), 64
End Sub

Function A1B2C3(ByVal x1 As String, ByVal x2 As String) As Variant
    Dim v1() As String, v2 As Integer, v3 As Integer, v4 As String, v5 As Integer
    v5 = 0
    v2 = 1

    Do
        v2 = InStr(v2, x1, x2, 1)
        If v2 > 0 Then
            v2 = v2 + Len(x2)
            v3 = InStr(v2, x1, Chr(13))
            If v3 = 0 Then v3 = Len(x1) + 1
            v4 = Trim(Mid(x1, v2, v3 - v2))

            ReDim Preserve v1(v5)
            v1(v5) = v4
            v5 = v5 + 1
            v2 = v3
        End If
    Loop While v2 > 0

    A1B2C3 = v1
End Function
 
Upvote 0
Hi youbitto. I see that Sam_D_Ben has provided you with some code that may work but I thought that I would offer you an alternative. The word files are .docx and the output is Sheet1 row 2 to whatever. HTH. Dave
VBA Code:
Sub test()
'extract right side of searchword paragraph for all docx files in selected folder
'output to sheet1 row 2 etc. in column order of SearchArr
Dim Wdapp As Object, WdDoc As Object, PagFlag As Boolean, SFolder As Object, MyPath As String
Dim FldrPicker As Object, Cnt As Integer, Cnt2 As Integer, SearchArr As Variant, SFile As Object
Dim FSO As Object, RowCnt As Integer, ParaArr() As String, Splitter As Variant, WdFlag As Boolean
Const TILDE As String = "~"

'searchwords to be found
SearchArr = Array("Name", "Age", "Birth", "Adresse")

'select folder
Set FldrPicker = Application.FileDialog(msoFileDialogFolderPicker)
With FldrPicker
.Title = "Select A Target Folder"
.AllowMultiSelect = False
If .Show = -1 Then
MyPath = .SelectedItems(1)
Else
MsgBox "Pick a folder!"
Exit Sub
End If
End With

'open Word application
On Error Resume Next
Set Wdapp = GetObject(, "Word.application")
If Err.Number <> 0 Then
On Error GoTo 0
Set Wdapp = CreateObject("Word.Application")
WdFlag = True
End If
Wdapp.Visible = False
'turn on pagination
PagFlag = False
If Wdapp.Options.Pagination = False Then
Wdapp.Options.Pagination = True
PagFlag = True
End If

RowCnt = 1
Set FSO = CreateObject("Scripting.FileSystemObject")
Set SFolder = FSO.GetFolder(MyPath)
For Each SFile In SFolder.Files
'open all NON ERROR docx files
If Right(SFile.Name, 4) = "docx" And Left(SFile.Name, 1) <> TILDE Then
RowCnt = RowCnt + 1
Set WdDoc = Wdapp.Documents.Open(SFile.Path)
'split doc contents into paragraphs
ParaArr() = Split(WdDoc.Content, Chr(13))

For Cnt2 = LBound(SearchArr) To UBound(SearchArr)
    'check each paragraph for searchword
    For Cnt = LBound(ParaArr) To UBound(ParaArr)
    If InStr(1, ParaArr(Cnt), SearchArr(Cnt2), vbTextCompare) Then
    'if searchword found enter result to sheet
    Splitter = Split(ParaArr(Cnt), ":")
    Sheets("sheet1").Cells(RowCnt, Cnt2 + 1) = Splitter(1)
    Exit For
    End If
    Next Cnt
Next Cnt2
Wdapp.ActiveDocument.Close SaveChanges:=False
End If
Next SFile

'return Word to orig setting
If PagFlag Then
Wdapp.Options.Pagination = False
End If
'clean up
If WdFlag Then
Wdapp.Quit
End If
Set Wdapp = Nothing
Set SFolder = Nothing
Set FSO = Nothing
End Sub
 
Upvote 0
Solution
Thank you very much, Mr.NdNoviceHlp and Mr.Sam_D_Ben. I have tried both codes and Mr.NdNoviceHlp's one worked perfectly but Mr.Sam_D_Ben showed an error "9" on this line

"maxE = Application.WorksheetFunction.Max(UBound(a7), UBound(a8), UBound(a9), UBound(a10))"

 
Upvote 0
Thank you very much, Mr.NdNoviceHlp and Mr.Sam_D_Ben. I have tried both codes and Mr.NdNoviceHlp's one worked perfectly but Mr.Sam_D_Ben showed an error "9" on this line

"maxE = Application.WorksheetFunction.Max(UBound(a7), UBound(a8), UBound(a9), UBound(a10))"

Because his code is looking for "name:", "age:", "birth:", "address:". If you have a space " " before ":" it will be an error because the code will not find it
-----------
1. In the code, change the "name, age, birth and adresse" (That's what it looks like in the picture) to the correct ones
2. Any number of spaces " " can be placed before and after the colon ":" (That's what it looks like in the picture)
3. In the code, change "Sheet1" to the proper name

The code should be fast.

Code:
Sub demo()
Dim count As Long, col As Long, filename As String, wordApp As Object, doc As Object, text As String, a, regex As Object, matches As Object, result()

    With Application.FileDialog(msoFileDialogOpen)
        .AllowMultiSelect = False
        .Filters.Add "Word Documents", "*.doc; *.docx; *.docm"
        If .Show = -1 Then
            filename = .SelectedItems(1)
        Else
            Exit Sub
        End If
    End With
    ReDim result(1 To 1000, 1 To 4) ' for max 1000 results
    Set wordApp = CreateObject("Word.Application")
    Set doc = wordApp.Documents.Open(filename)
    text = doc.Content.text
    wordApp.Quit
   
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
    End With
    a = Array("name", "age", "birth", "adresse")    ' In the code, change the name, age, birth and address to the correct ones
   
    For col = 1 To 4
        regex.Pattern = "^" & a(col - 1) & " *:(.+?)$"
        Set matches = regex.Execute(text)
        For count = 1 To matches.count
            result(count, col) = Trim(matches(count - 1).SubMatches(0))
        Next count
    Next col

    ThisWorkbook.Worksheets("Sheet1").Range("A2").Resize(count, 4).Value = result   ' Change "Sheet1" to the proper name
End Sub

I can't delete post #6
 
Upvote 0
Because his code is looking for "name:", "age:", "birth:", "address:". If you have a space " " before ":" it will be an error because the code will not find it
-----------
1. In the code, change the "name, age, birth and adresse" (That's what it looks like in the picture) to the correct ones
2. Any number of spaces " " can be placed before and after the colon ":" (That's what it looks like in the picture)
3. In the code, change "Sheet1" to the proper name

The code should be fast.

Code:
Sub demo()
Dim count As Long, col As Long, filename As String, wordApp As Object, doc As Object, text As String, a, regex As Object, matches As Object, result()

    With Application.FileDialog(msoFileDialogOpen)
        .AllowMultiSelect = False
        .Filters.Add "Word Documents", "*.doc; *.docx; *.docm"
        If .Show = -1 Then
            filename = .SelectedItems(1)
        Else
            Exit Sub
        End If
    End With
    ReDim result(1 To 1000, 1 To 4) ' for max 1000 results
    Set wordApp = CreateObject("Word.Application")
    Set doc = wordApp.Documents.Open(filename)
    text = doc.Content.text
    wordApp.Quit
  
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
    End With
    a = Array("name", "age", "birth", "adresse")    ' In the code, change the name, age, birth and address to the correct ones
  
    For col = 1 To 4
        regex.Pattern = "^" & a(col - 1) & " *:(.+?)$"
        Set matches = regex.Execute(text)
        For count = 1 To matches.count
            result(count, col) = Trim(matches(count - 1).SubMatches(0))
        Next count
    Next col

    ThisWorkbook.Worksheets("Sheet1").Range("A2").Resize(count, 4).Value = result   ' Change "Sheet1" to the proper name
End Sub

I can't delete post #6
Hi Mr.hungtbatman1 I changed the sheet name and the array values but I see no results after the code is excuted
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top