Extract Data from specific places in multiple word files to excel

youbitto

Board Regular
Joined
Jun 8, 2022
Messages
61
Office Version
  1. 2019
  2. 2016
  3. 2013
Platform
  1. Windows
Hello MrExcel Genuises

I have Multiple word Files that has some data that I need to extract to excel

the documents has over 40 pages and the data I need are in separated pages

I am Asking if there is a way or macro in excel to that work automaticaly

here an example in word :

1742129786235.png


what i want to be filled in excel

Nouveau Feuille de calcul Microsoft Excel.xlsx
ABCD
1NameAgeBirth Adresse
2
3
4
Feuil2



Thank you
 
Here is the ChatGPT answer:
To extract specific data from multiple Word files into Excel using VBA, you can leverage the Word Object Library within Excel VBA. This allows you to automate the process of opening Word documents, extracting the necessary information from specified pages or sections, and inserting that information into an Excel worksheet.


Here is a step-by-step guide along with VBA code to help you:


Approach:​


  1. Loop through multiple Word files: Open each Word document one by one.
  2. Extract data from specific pages: You can either extract text from specific sections, bookmarks, or paragraphs (since Word documents don't directly have "pages" that you can easily reference programmatically).
  3. Insert data into specific cells in Excel: The extracted data will be placed into specific cells on an Excel sheet.

Requirements:​


  • Enable the Microsoft Word Object Libraryin Excel VBA:
    • Go to the VBA editor (Alt + F11).
    • In the VBA editor, go to Tools -> References.
    • Check Microsoft Word xx.0 Object Library (where "xx" is the version of Word you have installed).

VBA Code:
Sub ExtractDataFromWordDocuments()

    Dim wdApp As Object
    Dim wdDoc As Object
    Dim fileDialog As FileDialog
    Dim selectedFiles As Variant
    Dim currentFile As Variant
    Dim i As Integer
    Dim excelRow As Long
    Dim pageContent As String
    Dim pageNumber As Integer
    Dim textToExtract As String
    Dim pageStart As Long
    Dim pageEnd As Long
    
    ' Create a new Word application instance
    Set wdApp = CreateObject("Word.Application")
    wdApp.Visible = False  ' You can set this to True for debugging purposes
    
    ' Initialize Excel row where data will be inserted
    excelRow = 1
    
    ' Open the file dialog to allow the user to select multiple files
    Set fileDialog = Application.FileDialog(msoFileDialogOpen)
    fileDialog.AllowMultiSelect = True
    fileDialog.Filters.Clear
    fileDialog.Filters.Add "Word Files", "*.docx; *.docm; *.doc"
    
    ' Show file dialog and store the selected files
    If fileDialog.Show = -1 Then
        selectedFiles = fileDialog.SelectedItems
    Else
        MsgBox "No files selected!"
        Exit Sub
    End If
    
    ' Loop through each selected Word document
    For Each currentFile In selectedFiles
        ' Open the Word document
        Set wdDoc = wdApp.Documents.Open(currentFile)
        
        ' Loop through the pages (in Word, pages are not directly accessible, 
        ' so we simulate the extraction by checking the text)
        ' For example, we will extract content from pages 1, 2, and 3 (this is just an example)
        
        For pageNumber = 1 To 3 ' Adjust the page range as necessary
            ' Extract content from a specific page (simulated by extracting text by paragraph)
            pageContent = ExtractPageContent(wdDoc, pageNumber) ' This function will extract the content
            
            ' Define what text to extract based on your needs
            textToExtract = pageContent ' You can apply more logic here to filter out specific data
            
            ' Insert extracted text into Excel (you can adjust the cell reference as needed)
            Sheets("Sheet1").Cells(excelRow, 1).Value = "File: " & currentFile
            Sheets("Sheet1").Cells(excelRow, 2).Value = "Page " & pageNumber
            Sheets("Sheet1").Cells(excelRow, 3).Value = textToExtract
            
            ' Increment Excel row for next data entry
            excelRow = excelRow + 1
        Next pageNumber
        
        ' Close the Word document
        wdDoc.Close False
    Next currentFile
    
    ' Close the Word application
    wdApp.Quit
    
    ' Release the Word objects
    Set wdDoc = Nothing
    Set wdApp = Nothing
    Set fileDialog = Nothing
    
    MsgBox "Data extraction complete!"
End Sub

' Function to simulate extracting content from a specific page
' Word doesn't directly use pages like Excel, so we simulate this by extracting paragraphs
Function ExtractPageContent(wdDoc As Object, pageNum As Integer) As String
    Dim para As Object
    Dim content As String
    Dim pageStart As Long
    Dim pageEnd As Long
    Dim currentPage As Integer
    
    currentPage = 1
    content = ""
    
    ' Loop through paragraphs to simulate extracting page data
    For Each para In wdDoc.Paragraphs
        ' Simulate the logic for determining the start and end of a page
        ' This is a simple example, and you may need to refine it based on your document structure
        If currentPage = pageNum Then
            content = content & para.Range.Text
        End If
        
        ' Simulate switching pages (in reality, this depends on your specific document)
        If Len(para.Range.Text) > 100 Then ' Adjust based on your document size
            currentPage = currentPage + 1
        End If
    Next para
    
    ' Return the extracted content
    ExtractPageContent = content
End Function
 
Upvote 0
Hello MrExcel Genuises

I have Multiple word Files that has some data that I need to extract to excel

the documents has over 40 pages and the data I need are in separated pages

I am Asking if there is a way or macro in excel to that work automaticaly

here an example in word :

...

what i want to be filled in excel



Thank you

@Rhodie72 please do not post answers from AI sources as per Rule#17
Thanks
Ah, I didn't know that one. It's the first time I have ever used AI and wasn't aware of this new rule. Sorry buddy. Won't happen again.
 
Upvote 0
Because his code is looking for "name:", "age:", "birth:", "address:". If you have a space " " before ":" it will be an error because the code will not find it
-----------
1. In the code, change the "name, age, birth and adresse" (That's what it looks like in the picture) to the correct ones
2. Any number of spaces " " can be placed before and after the colon ":" (That's what it looks like in the picture)
3. In the code, change "Sheet1" to the proper name

The code should be fast.

Code:
Sub demo()
Dim count As Long, col As Long, filename As String, wordApp As Object, doc As Object, text As String, a, regex As Object, matches As Object, result()

    With Application.FileDialog(msoFileDialogOpen)
        .AllowMultiSelect = False
        .Filters.Add "Word Documents", "*.doc; *.docx; *.docm"
        If .Show = -1 Then
            filename = .SelectedItems(1)
        Else
            Exit Sub
        End If
    End With
    ReDim result(1 To 1000, 1 To 4) ' for max 1000 results
    Set wordApp = CreateObject("Word.Application")
    Set doc = wordApp.Documents.Open(filename)
    text = doc.Content.text
    wordApp.Quit
  
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
    End With
    a = Array("name", "age", "birth", "adresse")    ' In the code, change the name, age, birth and address to the correct ones
  
    For col = 1 To 4
        regex.Pattern = "^" & a(col - 1) & " *:(.+?)$"
        Set matches = regex.Execute(text)
        For count = 1 To matches.count
            result(count, col) = Trim(matches(count - 1).SubMatches(0))
        Next count
    Next col

    ThisWorkbook.Worksheets("Sheet1").Range("A2").Resize(count, 4).Value = result   ' Change "Sheet1" to the proper name
End Sub

I can't delete post #6
Hi Mr.hungtbatman1 your code worked perfectly, but I found a problem with numerisation number I have some data like this
1742458484215.png


I tried to do like "[1-]" or "[1-]{1,}" ... but no results
I hope you help me with that
here is the new uploaded file as an example
https://1drv.ms/w/c/b601c788cb3f645c/EfKo2-ZwwbVIl7Mh2s2EjgUBX1ujGM-jcQiPV1Vcy_XfzQ
 
Upvote 0
Hi Mr.hungtbatman1 your code worked perfectly, but I found a problem with numerisation number I have some data like this
View attachment 123522

I tried to do like "[1-]" or "[1-]{1,}" ... but no results
I hope you help me with that
here is the new uploaded file as an example
https://1drv.ms/w/c/b601c788cb3f645c/EfKo2-ZwwbVIl7Mh2s2EjgUBX1ujGM-jcQiPV1Vcy_XfzQ
If you write it, you have to write it in detail.The thing is, in the image from post #1 E1 = "Location"?A2 = David, B2 = 50, C2 = 1960, D2 = France (number 1)E2 = ???If we count in order, it should be E2 = London (number 1). But (France + London) makes no sense. And my code has no AI and can't guess that E2 = Paris (number 2)Or maybe it's something else? I can't guess out of thin air.
 
Upvote 0
If you write it, you have to write it in detail.The thing is, in the image from post #1 E1 = "Location"?A2 = David, B2 = 50, C2 = 1960, D2 = France (number 1)E2 = ???If we count in order, it should be E2 = London (number 1). But (France + London) makes no sense. And my code has no AI and can't guess that E2 = Paris (number 2)Or maybe it's something else? I can't guess out of thin air.
it is something like the locations that where david works, so E2 = "London", F2= "Paris" and G2 = "Rome"
 
Upvote 0
1. In the WORD file between "Addresse : France" and "Locations :" there are only blank lines, there is no "Example Example Example Example Example ...". Is it always like that?

2. I see
Code:
Locations :
1- London
2- Paris
3- Rome
and after "3- Rome" there is the end of the file or at least 1 blank line. Is it always like that? Can it be e.g.
Code:
Locations :
1- London
2- Paris
3- Rome
"Example Example Example Example Example ..."
???
Is it about finding the place where the last location is - after the last location there is always a blank line?
 
Upvote 0
1. In the WORD file between "Addresse : France" and "Locations :" there are only blank lines, there is no "Example Example Example Example Example ...". Is it always like that?

2. I see
Code:
Locations :
1- London
2- Paris
3- Rome
and after "3- Rome" there is the end of the file or at least 1 blank line. Is it always like that? Can it be e.g.
Code:
Locations :
1- London
2- Paris
3- Rome
"Example Example Example Example Example ..."
???
Is it about finding the place where the last location is - after the last location there is always a blank line?
yes there are words after the locations after "Rome" , it is not the end of the word and between adresse and locations there are words too
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top