Convert Word Non-tabular Data to Excel Row/Column

johnmeyer

New Member
Joined
Oct 23, 2011
Messages
46
Office Version
  1. 2007
Platform
  1. Windows
I want to convert an alumni database, in the Word format shown below, into a row/column Excel workbook. Here is sample of what two records look like in Word:

Name: Diane L. Smith
Student Name: Diane L. Smith
Job Title: Computer Operator Tech.
Firm Name: Loyola Univ. Med. Ctr.
Degree(s) Earned from Institution(s) Attended: LISC, Beautician-State of IL Cert., Childcare-State of IL
Home Address: 138 W Center Dr., Palatine, Illinois
Home Phone Number: 708 33-1234
E-Mail Address: smith@aol.com
---------------------------
Name: Mrs. Elizabeth H. Doe
Student Name: Betsy Doe
Job Title: Physical Educ. Tchr.
Firm Name: AcmeISD
Home Address: 123 Main Dr. Galveston, Texas 12345
Home Phone Number: 817 333-1234
Spouse’s Name: David
Children’s Names: Clint, Cort

The break between records shown by the dashed line is actually a hard page break in word. Thus each "record" is on a separate page. The "field names" are all identical, are always boldfaced, and always end in a colon.

What I want to do is a little like the Excel Transpose command, but the problem is that not every record contains all possible fields, and also I have to somehow iterate through each "record" in Word.

I can program a macro, either in Word or Excel, to extract this information, but that is going to take awhile for me to get it all working. I assume something like this may be commercially available, and I'm more than happy to pay a little if there is a conversion program that does this sort of thing.

Here's my question: what search words do I use to describe the type of "database" shown above? Also, can anyone point me to any prior art that might get me going?

Thanks in advance for any pointers you might be able to provide.
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
something like this?

Column1NameStudent NameJob TitleFirm NameDegree(s) Earned from Institution(s) AttendedHome AddressHome Phone NumberE-Mail AddressSpouse’s NameChildren’s Names
Name: Diane L. Smith Diane L. Smith Diane L. Smith Computer Operator Tech. Loyola Univ. Med. Ctr. LISC, Beautician-State of IL Cert., Childcare-State of IL 138 W Center Dr., Palatine, Illinois 708 33-1234 smith@aol.com David Clint, Cort
Student Name: Diane L. Smith Mrs. Elizabeth H. Doe Betsy Doe Physical Educ. Tchr. AcmeISD 123 Main Dr. Galveston, Texas 12345 817 333-1234
Job Title: Computer Operator Tech.
Firm Name: Loyola Univ. Med. Ctr.
Degree(s) Earned from Institution(s) Attended: LISC, Beautician-State of IL Cert., Childcare-State of IL
Home Address: 138 W Center Dr., Palatine, Illinois
Home Phone Number: 708 33-1234
E-Mail Address: smith@aol.com
---------------------------
Name: Mrs. Elizabeth H. Doe
Student Name: Betsy Doe
Job Title: Physical Educ. Tchr.
Firm Name: AcmeISD
Home Address: 123 Main Dr. Galveston, Texas 12345
Home Phone Number: 817 333-1234
Spouse’s Name: David
Children’s Names: Clint, Cort
 
Upvote 0
something like this?
Yes, that shows exactly what I am trying to do.

Did you do that "by hand," or did you employ something like the copy/paste special(transpose) command?

As you can see from row three in your result, the database is "sparse," meaning that not all fields are present in each record. And, there has to be some way to have the code know how to recognize a new record.
 
Upvote 0
This was recognised automatically ;)
But first I'd like to know your Excel version (hope not 2003/7!)

honestly you can update your profile about Excel version (Account details)
 
Upvote 0
Hi, johnmeyer
Try this:
I assumed the first data of each group is "Name:", otherwise we'll need another code.
Put the data in column A then run this code:
VBA Code:
Sub a1117584a()
'https://www.mrexcel.com/board/threads/convert-word-non-tabular-data-to-excel-row-column.1117584/#post-5386507
Dim i As Long, n As Long
Dim va, vb, fm, ary
Dim d As Object


Application.ScreenUpdating = False
va = Range("A1", Cells(Rows.count, "A").End(xlUp))
ReDim vb(1 To UBound(va, 1), 1 To 2)

Set d = CreateObject("scripting.dictionary")
d.CompareMode = vbTextCompare

For i = 1 To UBound(va, 1)
    If Len(va(i, 1)) > 0 And InStr(va(i, 1), ":") Then
        ary = Split(va(i, 1), ":")
        d(ary(0)) = Empty
        vb(i, 1) = ary(0)
        vb(i, 2) = Trim(ary(1))
    End If
Next

Range("B1").Resize(1, d.count) = Application.Transpose(Application.Transpose(Array(d.Keys)))
n = 1
For i = 1 To UBound(vb, 1)
    If Len(vb(i, 1)) > 0 Then
    If LCase(vb(i, 1)) = "name" Then n = n + 1
        fm = Application.Match(vb(i, 1), Rows(1), 0)
        If IsNumeric(fm) Then
            Cells(n, fm) = vb(i, 2)
        End If
    
    End If
Next
Application.ScreenUpdating = True
End Sub

Result:
Book1
ABCDEFGHIJK
1Name: Diane L. SmithNameStudent NameJob TitleFirm NameDegree(s) Earned from Institution(s) AttendedHome AddressHome Phone NumberE-Mail AddressSpouse’s NameChildren’s Names
2Student Name: Diane L. SmithDiane L. SmithDiane L. SmithComputer Operator Tech.Loyola Univ. Med. Ctr.LISC, Beautician-State of IL Cert., Childcare-State of IL138 W Center Dr., Palatine, Illinois708 33-1234smith@aol.com
3Job Title: Computer Operator Tech.Mrs. Elizabeth H. DoeBetsy DoePhysical Educ. Tchr.AcmeISD123 Main Dr. Galveston, Texas 12345817 333-1234DavidClint, Cort
4Firm Name: Loyola Univ. Med. Ctr.
5Degree(s) Earned from Institution(s) Attended: LISC, Beautician-State of IL Cert., Childcare-State of IL
6Home Address: 138 W Center Dr., Palatine, Illinois
7Home Phone Number: 708 33-1234
8E-Mail Address: smith@aol.com
9---------------------------
10Name: Mrs. Elizabeth H. Doe
11Student Name: Betsy Doe
12Job Title: Physical Educ. Tchr.
13Firm Name: AcmeISD
14Home Address: 123 Main Dr. Galveston, Texas 12345
15Home Phone Number: 817 333-1234
16Spouse’s Name: David
Sheet3
 
Upvote 0
This was recognised automatically ;)
But first I'd like to know your Excel version (hope not 2003/7!)

honestly you can update your profile about Excel version (Account details)
Yes, it is 2003, although I have at least one computer with a "modern" version that is 2007. I was never able to get past the horrible UI changes from 2007 onward. I am still mostly running Windows XP, with a few of my dozen computers running Windows 7. These old systems just work. Having said all that, I really do need to get a current version of Excel because there is a lot of online parsing that can only be done with the newer versions. The "query" function in my version is now pretty creaky.
 
Upvote 0
Hi, johnmeyer
Try this:
I assumed the first data of each group is "Name:", otherwise we'll need another code.
Put the data in column A then run this code:
<snip>
Oh my gosh ...

Worked perfectly, first try.

If you have a Patreon or Paypal account (preferably Paypal) PM me so I can compensate you. I was not expecting a turnkey solution, but that's what you gave me.

My first programming was back in the late 1960s, in assembler, when we had zero memory and had to count lines of code. I therefore really appreciate how simple and tight your code is.

Elegant.

I can't thank you enough, but as I said, I'd be happy to give you a small Christmas present for your efforts if you have a Paypal account (I can sign up for Patreon if that's the only way to do it).

John Meyer
 
Upvote 0
it is 2003, although I have at least one computer with a "modern" version that is 2007.
so this is great you got solution from @Akuini because my solution works with much higher Excel versions
Have a nice day

and again, update your profile about Excel version (Account details) please
 
Upvote 0
I can't thank you enough, but as I said, I'd be happy to give you a small Christmas present for your efforts if you have a Paypal account (I can sign up for Patreon if that's the only way to do it).
You're welcome, glad to help, & thanks for the feedback.
I respectfully decline your kind offer, because it's enough for me that you've given me the feedback that my code is useful to you. :)
 
Upvote 0
You're welcome, glad to help, & thanks for the feedback.
I respectfully decline your kind offer, because it's enough for me that you've given me the feedback that my code is useful to you. :)
What a mensch! (y)

Just before I saw your reply I was going through your code to make sure I understand it (great opportunity for learning). I can see why someone came up with the phrase "computer language," because just like learning Spanish or German, you think you know something until you realize home many verbs and nouns you haven't yet learned, and how much syntax is still foreign.

The code block that solved the problem I would have puzzled over for hours was:

VBA Code:
fm = Application.Match(vb(i, 1), Rows(1), 0)
        If IsNumeric(fm) Then
            Cells(n, fm) = vb(i, 2)
        End If

That neatly solves the problem of populating each row when the data in each record is sparse (i.e., most records don't contain every data field).
 
Upvote 0

Forum statistics

Threads
1,223,883
Messages
6,175,167
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top