Extract Data from specific places in multiple word files to excel

youbitto

Board Regular
Joined
Jun 8, 2022
Messages
61
Office Version
  1. 2019
  2. 2016
  3. 2013
Platform
  1. Windows
Hello MrExcel Genuises

I have Multiple word Files that has some data that I need to extract to excel

the documents has over 40 pages and the data I need are in separated pages

I am Asking if there is a way or macro in excel to that work automaticaly

here an example in word :

1742129786235.png


what i want to be filled in excel

Nouveau Feuille de calcul Microsoft Excel.xlsx
ABCD
1NameAgeBirth Adresse
2
3
4
Feuil2



Thank you
 
yes there are words after the locations after "Rome" , it is not the end of the word and between adresse and locations there are words too
The line after Rome is empty or not empty - "Example Example Example Example Example ..."
Why am I asking?In the WORD file there is
1- London
2- Paris
3- Rome
But after copying to the text variable there are only:
London
Paris
Rome

There are no 1-, 2-, 3- because these are numberings.

The code does not have AI and does not know whether Paris, Rome are already the "Example Example" data. Therefore, if the code knows that after the last location there is always an empty line, the code knows where the locations end
 
Upvote 0
So after the last location there is ALWAYS an empty line or not?
Choose:
1.
1- London
2- Paris
3- Rome
Tomorrow will be sunny

blala hichic...

2.
1- London
2- Paris
3- Rome

Tomorrow will be sunny
blala hichic...

1. or 2.?
 
Upvote 0
So after the last location there is ALWAYS an empty line or not?
Choose:
1.
1- London
2- Paris
3- Rome
Tomorrow will be sunny

blala hichic...

2.
1- London
2- Paris
3- Rome

Tomorrow will be sunny
blala hichic...
2
 
Upvote 0
If 2 then


Code:
Sub demo()
Dim count As Long, col As Long, k As Long, pos As Long, location, temp As String, filename As String, wordApp As Object, doc As Object, text As String, a, regex As Object, matches As Object, result()

    With Application.FileDialog(msoFileDialogOpen)
        .AllowMultiSelect = False
        .Filters.Add "Word Documents", "*.doc; *.docx; *.docm"
        If .Show = -1 Then
            filename = .SelectedItems(1)
        Else
            Exit Sub
        End If
    End With
    ReDim result(1 To 1000, 1 To 4) ' for max 1000 results
    Set wordApp = CreateObject("Word.Application")
    Set doc = wordApp.Documents.Open(filename)
    text = doc.Content.text
    wordApp.Quit
 
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
    End With
    a = Array("name", "age", "birth", "adresse")    ' In the code, change the name, age, birth and address to the correct ones
 
    For col = 1 To 4
'        regex.Pattern = "^" & a(col - 1) & " *:(.+?)$"
        regex.Pattern = "^" & a(col - 1) & "[ \t\xA0]*:(.+?)$"
        Set matches = regex.Execute(text)
        For count = 1 To matches.count
            result(count, col) = Trim(matches(count - 1).SubMatches(0))
            If col = 4 Then
                k = 0
                pos = InStr(matches(count - 1).FirstIndex + 1 + Len(matches(count - 1).Value), text, "Locations", vbTextCompare)
                If pos > 0 Then
                    pos = InStr(pos, text, Chr(13)) + 1
                    temp = Mid(text, pos, 200)
                    location = Split(temp, Chr(13))
                    Do While location(k) <> ""
                        If UBound(result, 2) < col + k + 1 Then ReDim Preserve result(1 To UBound(result, 1), 1 To col + k + 1)
                        result(count, col + k + 1) = location(k)
                        k = k + 1
                    Loop
                End If
            End If
        Next count
    Next col

    ThisWorkbook.Worksheets("Sheet1").Range("A2").Resize(count, UBound(result, 2)).Value = result  ' Change "Sheet1" to the proper name
End Sub
---------------------------
Added
Code:
, k As Long, pos As Long, location, temp As String
...
If col = 4 Then
                k = 0
                pos = InStr(matches(count - 1).FirstIndex + 1 + Len(matches(count - 1).Value), text, "Locations", vbTextCompare)
                If pos > 0 Then
                    pos = InStr(pos, text, Chr(13)) + 1
                    temp = Mid(text, pos, 200)
                    location = Split(temp, Chr(13))
                    Do While location(k) <> ""
                        If UBound(result, 2) < col + k + 1 Then ReDim Preserve result(1 To UBound(result, 1), 1 To col + k + 1)
                        result(count, col + k + 1) = location(k)
                        k = k + 1
                    Loop
                End If
End If
…
UBound(result, 2)
 
Upvote 0
There is NO EMPTY LINE in the WORD file after Rome. That is 1. not 2. in post #35, buddy
 
Upvote 0
There is NO EMPTY LINE in the WORD file after Rome. That is 1. not 2. in post #35, buddy
your code worked perfectly after I added the line
to be honest I dont have the words that I want to extract the informations from but if I remember correctly there is an empty line there
does the code change completly if there is a line or not ?
 
Upvote 0
I downloaded the document and have been following along. I think the extra line is pretty important. What doesn't seem to have been mentioned, is that you are looking for bulleted point information. You can use this fact to locate your information without worrying about the blank line issue. @hungtbatman1 perhaps this will help adapt your code. HTH. Dave
VBA Code:
Sub test()
'extract right side of searchword paragraph for all docx files in selected folder
'output to sheet1 row 2 etc. in column order of SearchArr
Dim Wdapp As Object, WdDoc As Object, PagFlag As Boolean, SFolder As Object, MyPath As String, Cnt3 As Integer
Dim FldrPicker As Object, Cnt As Integer, Cnt2 As Integer, SearchArr As Variant, SFile As Object, ColCnt As Integer
Dim FSO As Object, RowCnt As Integer, ParaArr() As String, Splitter As Variant, WdFlag As Boolean
Const TILDE As String = "~"

'searchwords to be found
SearchArr = Array("Name", "Age", "Birth", "Adresse", "Locations")

'select folder
Set FldrPicker = Application.FileDialog(msoFileDialogFolderPicker)
With FldrPicker
.Title = "Select A Target Folder"
.AllowMultiSelect = False
If .Show = -1 Then
MyPath = .SelectedItems(1)
Else
MsgBox "Pick a folder!"
Exit Sub
End If
End With

'open Word application
On Error Resume Next
Set Wdapp = GetObject(, "Word.application")
If Err.Number <> 0 Then
On Error GoTo 0
Set Wdapp = CreateObject("Word.Application")
WdFlag = True
End If
Wdapp.Visible = False
'turn on pagination
PagFlag = False
If Wdapp.Options.Pagination = False Then
Wdapp.Options.Pagination = True
PagFlag = True
End If

RowCnt = 1
Set FSO = CreateObject("Scripting.FileSystemObject")
Set SFolder = FSO.GetFolder(MyPath)
For Each SFile In SFolder.Files
'open all NON ERROR docx files
If Right(SFile.Name, 4) = "docx" And Left(SFile.Name, 1) <> TILDE Then
RowCnt = RowCnt + 1
Set WdDoc = Wdapp.Documents.Open(SFile.Path)
'split doc contents into paragraphs
ParaArr() = Split(WdDoc.Content, Chr(13))

For Cnt2 = LBound(SearchArr) To UBound(SearchArr)
    'check each paragraph for searchword
    For Cnt = LBound(ParaArr) To UBound(ParaArr)
    If InStr(1, ParaArr(Cnt), SearchArr(Cnt2), vbTextCompare) Then
    'if searchword found enter result to sheet
    Splitter = Split(ParaArr(Cnt), ":")
    If Len(Splitter(1)) <> 1 Then
    Sheets("sheet1").Cells(RowCnt, Cnt2 + 1) = Splitter(1)
    'accomodate "Locations:"
    Else
        ColCnt = Cnt2
        For Cnt3 = Cnt + 1 To UBound(ParaArr)
        'check paragraph for bullet list
        If WdDoc.Paragraphs(Cnt3 + 1).Range.ListFormat.ListType = 3 Then ' 3 is Simple numeric list
        Sheets("sheet1").Cells(RowCnt, ColCnt + 1) = ParaArr(Cnt3)
        ColCnt = ColCnt + 1
        Else ' exit search for bullets and reset paragraph search cnt
        Cnt = Cnt3
        Exit For
        End If
        Next Cnt3
    End If
    Exit For
    End If
    Next Cnt
Next Cnt2
Wdapp.ActiveDocument.Close SaveChanges:=False
End If
Next SFile

'return Word to orig setting
If PagFlag Then
Wdapp.Options.Pagination = False
End If
'clean up
If WdFlag Then
Wdapp.Quit
End If
Set Wdapp = Nothing
Set SFolder = Nothing
Set FSO = Nothing
End Sub
ps. I thought this was for multi files in a folder?
 
Upvote 0
I downloaded the document and have been following along. I think the extra line is pretty important. What doesn't seem to have been mentioned, is that you are looking for bulleted point information. You can use this fact to locate your information without worrying about the blank line issue. @hungtbatman1 perhaps this will help adapt your code. HTH. Dave
VBA Code:
Sub test()
'extract right side of searchword paragraph for all docx files in selected folder
'output to sheet1 row 2 etc. in column order of SearchArr
Dim Wdapp As Object, WdDoc As Object, PagFlag As Boolean, SFolder As Object, MyPath As String, Cnt3 As Integer
Dim FldrPicker As Object, Cnt As Integer, Cnt2 As Integer, SearchArr As Variant, SFile As Object, ColCnt As Integer
Dim FSO As Object, RowCnt As Integer, ParaArr() As String, Splitter As Variant, WdFlag As Boolean
Const TILDE As String = "~"

'searchwords to be found
SearchArr = Array("Name", "Age", "Birth", "Adresse", "Locations")

'select folder
Set FldrPicker = Application.FileDialog(msoFileDialogFolderPicker)
With FldrPicker
.Title = "Select A Target Folder"
.AllowMultiSelect = False
If .Show = -1 Then
MyPath = .SelectedItems(1)
Else
MsgBox "Pick a folder!"
Exit Sub
End If
End With

'open Word application
On Error Resume Next
Set Wdapp = GetObject(, "Word.application")
If Err.Number <> 0 Then
On Error GoTo 0
Set Wdapp = CreateObject("Word.Application")
WdFlag = True
End If
Wdapp.Visible = False
'turn on pagination
PagFlag = False
If Wdapp.Options.Pagination = False Then
Wdapp.Options.Pagination = True
PagFlag = True
End If

RowCnt = 1
Set FSO = CreateObject("Scripting.FileSystemObject")
Set SFolder = FSO.GetFolder(MyPath)
For Each SFile In SFolder.Files
'open all NON ERROR docx files
If Right(SFile.Name, 4) = "docx" And Left(SFile.Name, 1) <> TILDE Then
RowCnt = RowCnt + 1
Set WdDoc = Wdapp.Documents.Open(SFile.Path)
'split doc contents into paragraphs
ParaArr() = Split(WdDoc.Content, Chr(13))

For Cnt2 = LBound(SearchArr) To UBound(SearchArr)
    'check each paragraph for searchword
    For Cnt = LBound(ParaArr) To UBound(ParaArr)
    If InStr(1, ParaArr(Cnt), SearchArr(Cnt2), vbTextCompare) Then
    'if searchword found enter result to sheet
    Splitter = Split(ParaArr(Cnt), ":")
    If Len(Splitter(1)) <> 1 Then
    Sheets("sheet1").Cells(RowCnt, Cnt2 + 1) = Splitter(1)
    'accomodate "Locations:"
    Else
        ColCnt = Cnt2
        For Cnt3 = Cnt + 1 To UBound(ParaArr)
        'check paragraph for bullet list
        If WdDoc.Paragraphs(Cnt3 + 1).Range.ListFormat.ListType = 3 Then ' 3 is Simple numeric list
        Sheets("sheet1").Cells(RowCnt, ColCnt + 1) = ParaArr(Cnt3)
        ColCnt = ColCnt + 1
        Else ' exit search for bullets and reset paragraph search cnt
        Cnt = Cnt3
        Exit For
        End If
        Next Cnt3
    End If
    Exit For
    End If
    Next Cnt
Next Cnt2
Wdapp.ActiveDocument.Close SaveChanges:=False
End If
Next SFile

'return Word to orig setting
If PagFlag Then
Wdapp.Options.Pagination = False
End If
'clean up
If WdFlag Then
Wdapp.Quit
End If
Set Wdapp = Nothing
Set SFolder = Nothing
Set FSO = Nothing
End Sub
ps. I thought this was for multi files in a folder?
thank you Mr. NdNoviceHlp this works perfectly with or withour the line
well putting the multiple words in a folder is easy too , selecting all the files or choosing the folder works both fine for me
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top