VBA extracting MS Word Form Data into Excel Table

nadair1

New Member
Joined
Aug 21, 2018
Messages
4
Good afternoon. I've read through the forum rules, and I'm hoping this thread makes the cut. I'm a newbie here, so I'd appreciate your input.

I lead a nonprofit, and I'm trying to research other similar nonprofits. I have a file folder with 64 docx files that summarize each nonprofit, all provided to me by an association of which my organization is a member. I want to pull the same blocks of information from each of these 64 documents into an excel spreadsheet, and organize it into a spreadsheet. Each organization's filename contains their region and name. Here's an example: "North America-Shelter and Rain.docx". I will then use this spreadsheet to start adding columns as I go about my research.

Fortunately, most all of the documents have standardized sections where each nonprofit has entered information about their organization, such as the organization's name, focus of the organization, etc. There's a few nonprofits that have slightly different versions of the same form, whereby some of the standard fields are missing. There are no tables, graphics, etc. -- just simple text fields and text entries.

I've created an exhaustive list of the fields below (all fields are preceded by a colon in the Word docs)...


  • Name of Organization:
  • Founder(s) and year of founding organization:
  • Director(s):
  • e-mail:
  • telephone:
  • Country(s) where your organization is located:
  • Address of organization:
  • Telephone number(s):
  • e-mail:
  • web-site:
  • In 3 sentences or less, what is the major focus of your organization?
  • To what organizations are you related or by whom are you sponsored?
  • How many are on your staff?
  • Indigenous:
  • Foreign:
  • What are your dreams for your ministry?
  • Do you need (or want) volunteers? If so, to do what?

I'm using the 2016 versions of Excel and Word. Here's the filepath where the docx files reside: C:\Users\noah\Desktop\Organizations

As for a finished product, ideally it would be nice to have the organization's filename in one column, and all the fields listed in the Word documents in columns adjacent to the filename, then the organizations' responses to the fields in rows under the appropriate column headers.

Thanks so much!
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Try the following macro. It is assumed your Excel workbook will have column headers for the data in the same order (but not necessarily the same wording) as the list you gave. The macro looks through each document for the names in the list and, if found, will output whatever follows it in the same paragraph (or, if that's empty, whatever's in the next non-empty paragraph [note: this might also lead to something irrelevant being picked up if the 'field' is present but unanswered]), in the appropriate row & column in your workbook. Only the data in the first non-empty paragraph for a given item is returned; it's impossible to know how many paragraphs something like a directors list, for example, might span if an item's data are not all entered in the same paragraph.
Code:
Sub GetData()
'Note: this code requires a reference to the Word object model.
'See under the VBA Editor's Tools|References.
Application.ScreenUpdating = False
Dim WkSht As Worksheet, r As Long, c As Long
Dim wdApp As New Word.Application, wdDoc As Word.Document
Dim strFile As String, strFolder As String, strOut As String, StrFnd
strFolder = "C:\Users\" & Environ("UserName") & "\Desktop\Organizations\"
StrFnd = Array("", "", "Name of Organization:", "Founder(s) and year of founding organization:", _
  "Director(s):", "e-mail:", "telephone:", "Country(s) where your organization is located:", _
  "Address of organization:", "Telephone number(s):", "e-mail:", "web-site:", _
  "In 3 sentences or less, what is the major focus of your organization?", _
  "To what organizations are you related or by whom are you sponsored?", _
  "How many are on your staff?", "Indigenous:", "Foreign:", _
  "What are your dreams for your ministry?", "Do you need (or want) volunteers? If so, to do what?")
Set WkSht = ActiveSheet: r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
  r = r + 1
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
  WkSht.Cells(r, 1).Value = Split(strFile, ".doc")(0)
  With wdDoc
    For c = 2 To UBound(StrFnd)
      With .Range
        With .Find
          .ClearFormatting
          .Replacement.ClearFormatting
          .Replacement.Text = ""
          .Forward = True
          .Format = False
          .MatchCase = False
          .MatchWildcards = False
          .Text = StrFnd(c)
          .Wrap = wdFindContinue
          .Execute
        End With
        If .Find.Found = True Then
          .End = .Paragraphs(1).Range.End
          .Start = .Start + Len(StrFnd(c))
          strOut = Trim(Replace(Replace(Replace(Split(.Text, vbCr)(0), vbTab, " "), Chr(11), " "), Chr(160), " "))
          Do While strOut = ""
            .Collapse wdCollapseEnd
            .MoveEnd wdParagraph, 1
            strOut = Trim(Replace(Replace(Replace(Split(.Text, vbCr)(0), vbTab, " "), Chr(11), " "), Chr(160), " "))
          Loop
          WkSht.Cells(r, c).Value = strOut
        End If
      End With
    Next
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub
 
Last edited:
Upvote 0
Paul - Thanks so much for the code! It worked really well, and now I have a full table from which I can start my benchmarking efforts. Very cool.
 
Upvote 0
Actually, I'm starting to find that the macro didn't pull some of the text. The issue seems to be this: when the respondent listed out bullet-points detailing a list of their directors/board members, or when they submitted any kind of list-form response, the macro only pulled the first block of text.

Is there any way to have the macro pull all text after the first "field" [ie. Director(s)] and before the next field [ie. e-mail]?

  • Director(s):
    • Bob
    • Dave
    • Julie
  • e-mail:
    • Bob@anynoprofit.org
    • Dave@anynoprofit.org
    • Julie@anynoprofit.org
So in this case, could the macro pull Bob, Dave and Julie's names into one cell and their emails into another cell? From that point, I can do some data-parsing using formulas in Excel to segment out the names and email addresses.

Thanks!
 
Upvote 0
As I said:
Only the data in the first non-empty paragraph for a given item is returned; it's impossible to know how many paragraphs something like a directors list, for example, might span if an item's data are not all entered in the same paragraph.
What you're describing fits that exactly.
 
Upvote 0

Forum statistics

Threads
1,223,234
Messages
6,170,891
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top