# VBA extracting MS Word Form Data into Excel Table



## nadair1 (Aug 21, 2018)

Good afternoon. I've read through the forum rules, and I'm hoping this thread makes the cut. I'm a newbie here, so I'd appreciate your input. 

I lead a nonprofit, and I'm trying to research other similar nonprofits. I have a file folder with 64 docx files that summarize each nonprofit, all provided to me by an association of which my organization is a member. I want to pull the same blocks of information from each of these 64 documents into an excel spreadsheet, and organize it into a spreadsheet. Each organization's filename contains their region and name. Here's an example: "North America-Shelter and Rain.docx". I will then use this spreadsheet to start adding columns as I go about my research.

Fortunately, most all of the documents have standardized sections where each nonprofit has entered information about their organization, such as the organization's name, focus of the organization, etc. There's a few nonprofits that have slightly different versions of the same form, whereby some of the standard fields are missing. There are no tables, graphics, etc. -- just simple text fields and text entries.

I've created an exhaustive list of the fields below (all fields are preceded by a colon in the Word docs)...



Name of Organization:
Founder(s) and year of founding organization:
Director(s):
e-mail:
telephone:
Country(s) where your organization is located:
Address of organization:
Telephone number(s):
e-mail:
web-site:
In 3 sentences or less, what is the major focus of your organization?
To what organizations are you related or by whom are you sponsored?
How many are on your staff?
Indigenous:
Foreign:
What are your dreams for your ministry?
Do you need (or want) volunteers? If so, to do what?

I'm using the 2016 versions of Excel and Word. Here's the filepath where the docx files reside: C:\Users\noah\Desktop\Organizations

As for a finished product, ideally it would be nice to have the organization's filename in one column, and all the fields listed in the Word documents in columns adjacent to the filename, then the organizations' responses to the fields in rows under the appropriate column headers.

Thanks so much!


----------



## Macropod (Aug 21, 2018)

Try the following macro. It is assumed your Excel workbook will have column headers for the data in the same order (but not necessarily the same wording) as the list you gave. The macro looks through each document for the names in the list and, if found, will output whatever follows it in the same paragraph (or, if that's empty, whatever's in the next non-empty paragraph [note: this might also lead to something irrelevant being picked up if the 'field' is present but unanswered]), in the appropriate row & column in your workbook. Only the data in the first non-empty paragraph for a given item is returned; it's impossible to know how many paragraphs something like a directors list, for example, might span if an item's data are not all entered in the same paragraph.

```
Sub GetData()
'Note: this code requires a reference to the Word object model.
'See under the VBA Editor's Tools|References.
Application.ScreenUpdating = False
Dim WkSht As Worksheet, r As Long, c As Long
Dim wdApp As New Word.Application, wdDoc As Word.Document
Dim strFile As String, strFolder As String, strOut As String, StrFnd
strFolder = "C:\Users\" & Environ("UserName") & "\Desktop\Organizations\"
StrFnd = Array("", "", "Name of Organization:", "Founder(s) and year of founding organization:", _
  "Director(s):", "e-mail:", "telephone:", "Country(s) where your organization is located:", _
  "Address of organization:", "Telephone number(s):", "e-mail:", "web-site:", _
  "In 3 sentences or less, what is the major focus of your organization?", _
  "To what organizations are you related or by whom are you sponsored?", _
  "How many are on your staff?", "Indigenous:", "Foreign:", _
  "What are your dreams for your ministry?", "Do you need (or want) volunteers? If so, to do what?")
Set WkSht = ActiveSheet: r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
strFile = Dir(strFolder & "\*.doc", vbNormal)
While strFile <> ""
  r = r + 1
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
  WkSht.Cells(r, 1).Value = Split(strFile, ".doc")(0)
  With wdDoc
    For c = 2 To UBound(StrFnd)
      With .Range
        With .Find
          .ClearFormatting
          .Replacement.ClearFormatting
          .Replacement.Text = ""
          .Forward = True
          .Format = False
          .MatchCase = False
          .MatchWildcards = False
          .Text = StrFnd(c)
          .Wrap = wdFindContinue
          .Execute
        End With
        If .Find.Found = True Then
          .End = .Paragraphs(1).Range.End
          .Start = .Start + Len(StrFnd(c))
          strOut = Trim(Replace(Replace(Replace(Split(.Text, vbCr)(0), vbTab, " "), Chr(11), " "), Chr(160), " "))
          Do While strOut = ""
            .Collapse wdCollapseEnd
            .MoveEnd wdParagraph, 1
            strOut = Trim(Replace(Replace(Replace(Split(.Text, vbCr)(0), vbTab, " "), Chr(11), " "), Chr(160), " "))
          Loop
          WkSht.Cells(r, c).Value = strOut
        End If
      End With
    Next
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub
```


----------



## nadair1 (Aug 22, 2018)

Paul - Thanks so much for the code! It worked really well, and now I have a full table from which I can start my benchmarking efforts. Very cool.


----------



## nadair1 (Aug 22, 2018)

Actually, I'm starting to find that the macro didn't pull some of the text. The issue seems to be this: when the respondent listed out bullet-points detailing a list of their directors/board members, or when they submitted any kind of list-form response, the macro only pulled the first block of text.

Is there any way to have the macro pull all text after the first "field" [ie. Director(s)] and before the next field [ie. e-mail]?


Director(s):
Bob
Dave
Julie

e-mail:
Bob@anynoprofit.org
Dave@anynoprofit.org
Julie@anynoprofit.org

So in this case, could the macro pull Bob, Dave and Julie's names into one cell and their emails into another cell? From that point, I can do some data-parsing using formulas in Excel to segment out the names and email addresses.

Thanks!


----------



## Macropod (Aug 22, 2018)

As I said:


> Only the data in the first non-empty paragraph for a given item is returned; it's impossible to know how many paragraphs something like a directors list, for example, might span if an item's data are not all entered in the same paragraph.


What you're describing fits that exactly.


----------



## nadair1 (Aug 22, 2018)

Got it. Thanks for your help.


----------

