Database Text File Problem - Macro?

Joe_Toz

New Member
Joined
Apr 2, 2004
Messages
42
I am hoping that someone out there might be able to help or point me in the right direction. I currently have a very large text file (generated via Omnipage 15 Pro) of information that must be used to create a customer database. Ideally, the database would have the following fields:
Column1: Indicator (a plus or minus sign, numeric value, etc.)
Column2: Last Name
Column3: First Name
Column4: Year
Column5: Street Address
Column6: City
Column7: State
Column8: Zip
Column9: Phone #
The problem is that the text file is rather choppy and disjointed. The customer records are not uniform in length. Some do not contain enough information for all 9 fields. Most records have more than 1 line of information when opened in Excel. There is, however, some uniformity: the indicator (if any) is always first, followed by last name and then first name. Also, an r: should always precede the address information. The following is a sample of the file:

+WALDON, Mrs. Diana (Diana A. Davies); 1977; r: 555 Hazel Dr.,
Warner Robins, PA 31088, 555 542-7811
+WALDRON, Edwin W.; 1985; Flight Attnd., United Airlines; r: 556
NW 24th St., Miami Shrs., FL 33150;
johndoe@aol.com
WALDRON, Heidi L. (Heidi L. Scheeringa); 1980; r: POB 1234,
Helen, PA 30545
WALDRON, Helen; 1935 (See Ruth, Mrs. Helen)
4-WALDRON, Minerva (Min); 1945; Retired Transcriber/Operator,
State of Florida; r: 557 S. K St., Lake Worth, FL 33460, 55712-
5183
WALENIUS, Mrs. Laura J. (Laura Jayne Doyle); 1961; r: 555
Ridge St., Lake Worth, FL 33460, 555 555-5873
WALENIUS, Richard S.; 1977; r: 5567 S. 9th Ave., Lake Worth,
FL 33461, 555 555-2726
WALKDEN, Arthur B.; 1965; r: 5568 Lake Ave., Lake Worth, FL
33460, 555 555-1889
WALKER, Cassondra; 2000; r: 5569 S. C St., Lake Worth, FL
33460, 555 555-6154
4-WALKER, Cecil; 1954; Retired; r 1234 Jog Rd, Lake Worth, FL
33467, 555 565-6660; Lois; Dale, Keith, Cynthia
WALKER, Charles L.; 1979; r: 123 S B St., Lake Worth, FL 33460
WALKER, David M.; 1969; r: 5570 Dunford Rd., Westlake, OH
44145
4-WALKER, Jeffrey; 1970; Admin., Univ. of Notre Dame; r: 55701
Ray Dr., Granger, IN 46530, 555 555-7541; Shari; Erin, Nathan,
Tim;
johndoe@nd.edu
+WALKER, Jeremy H.; 1991; Maintenance Worker II, Palm Beach
Cnty.; r: 55072 24th St. N, Royal Palm Bch., FL 33412, 555 555-
7311;
johndoe@juno.com
4-WALKER, Lillie (Lillie H. Robinson); 1979; Min ister/Ofc. Mgr.,
Christian Come Alive, 561 369-0950; r: 1234 Fairgreen Rd, W.
Palm Bch., FL 33417, 555 555-8337; Timonthy; Latoya, Laquin-
ton, Timonthy Jr



Please let me know if there are any Macros (or other options) that might help.
Any assistance will be greatly appreciated.
Thanks!
 

Excel Facts

Is there a shortcut key for strikethrough?
Ctrl+S is used for Save. Ctrl+5 is used for Strikethrough. Why Ctrl+5? When you use hashmarks to count |||| is 4, strike through to mean 5.
Hi Joe,

Well, looking at your sample datafile, the only consistent part seems to be that the surnames are all capital letters. The only other contiguos capital letters are those indicating the State (FL for example). You would need to get hold of a good programmers text editor such as UltraEdit 32 (www.ultraedit.com) and use it to clean up the extra CR/LF's so that you only have one line for each entry. Then you could attempt to import the resulting file into Excel. As each entry does not necessarily have all fields, I doubt that much could be accomplished with macro's, I think that you will probably have to edit it manually.

Maybe some genius on this board will be able to help you further on this !

Good Luck !

Ciao,

Don.
 
Upvote 0
Hi Don,

Thanks for the reply. I don’t know if this changes the approach or not, but the OmniPage software can generate a CSV file for the data. In this file, there will be one line item for each record. However, there are some occasions (not too many) where 2 records are combined on the same row. Please let me know if this could be used for a solution.

Thanks!
 
Upvote 0
Hi Joe,

Could you post an example CSV to see if it could be a better approach ?

Ciao,

Don.
 
Upvote 0
Yes, thanks again for taking a look at this. Here is a sample of the CSV file. Again, the main problem with this is that some records are combined on the same line. Here's the sample:

+WALDON, Mrs. Diana (Diana A. Davies); 1977; r: 555 Hazel Dr., Warner Robins, GA 31088, 555 555-7811
+WALDRON, Edwin W.; 1985; Flight Attnd., United Airlines; r: 555 NW 94th St., Miami Shrs., FL 33150;
WALDRON, Heidi L. (Heidi L. Scheeringa); 1980; r: POB 1234, Helen, GA 30545
WALDRON, Helen; 1935 (See Ruth, Mrs. Helen)
4-WALDRON, Minerva (Min); 1945; Retired Transcriber/Operator, State of Florida; r: 555 S. K St., Lake Worth, FL 33460, 555 5555183
WALENIUS, Mrs. Laura J. (Laura Jayne Doyle); 1961; r: 555 Ridge St., Lake Worth, FL 33460, 555 555-5873
WALENIUS, Richard S.; 1977; r: 5555 S. 6th Ave., Lake Worth, FL 33461, 555 555-2726
+WALKER, Jeremy H.; 1991; Maintenance Worker II, Palm Beach Cnty.; r: 55555 74th St. N, Royal Palm Bch., FL 33412, 555 5557311;
4-WALKER, Judy; 1948; Retired; r: POB 5555, Edgewater, FL 32132, 555 555-2422
4-WALKER, Latonya; 1980; Surgical Coder; r: 555 NE 13th Ave., Boynton Bch., FL 33435, 555 555-0001
WALKER, Laura A.; 2001; r: 5555 Pine Valley Dr., Wellington, FL 33414
4-WALKER, Lillie (Lillie H. Robinson); 1979; Min ister/Ofc. Mgr., Christian Come Alive, 555 555-0950; r: 5555 Fairgreen Rd, W. Palm Bch., FL 33417, 555 555-8337; Timonthy; Latoya, Laquinton, Timonthy Jr
+WALKER, Mrs. Linda M. (Linda M. McConnell); 1968; Leg al Secy., Liggio Benrubi & Williams PA, 1615 Forum Pl., Ste. 3-B, W. Palm Bch., FL 33401, 561 616-3333; r: 8411 Theresa Rd, Boynton Bch., FL 33437, 561 731-2535; Alison, Christopher
4-WALKER, Lisa; 1984 (See Poarch, Lisa)
WALKER, Lloyd H.; 1982; r: 5555 Kirk Rd., Lake Worth, FL 33461, 555 555-2675
+WALKER, Mrs. Lois (Lois Derringer); 1956; Retired EKG & EEG Tech.; r: 5555 Jog Rd, Lake Worth, FL 33467, 555 555-6660; Cecil; Dale, Keith, Cynthia
+WALKER, Marihelen (Marihelen Purvis); 1967; Ad missions Coord., St. Luke's Episcopal Hosp., POB 55555, Houston, TX 77225, 555 555-8814; r: 55555 Tolman St., Houston, TX 77034, 555 555-8737;
WALKER, Michael L; 1988; Ramp Assoc., Delta Air Lines; r: 5555 N. D St., Lake Worth, FL 33460, 555 555-6860 +WALKER, Michelle; 1975 (See Craft, Michelle)
+WARING, Larry; 1967; US Post Ofc.; r: 5555 Elmhurst Rd Apt. F, W. Palm Bch., FL 33417, 555 555-4350; Melissa, Kelly, Joshua WARN, Steven R.; 1978; r: 5555 Ramsey Dr., Lake Worth, FL 33461, 555 555-2607

Thanks again for your help!
 
Upvote 0
Joe_Toz said:
Yes, thanks again for taking a look at this. Here is a sample of the CSV file. Again, the main problem with this is that some records are combined on the same line. Here's the sample:

+WALDON, Mrs. Diana (Diana A. Davies); 1977; r: 555 Hazel Dr., Warner Robins, GA 31088, 555 555-7811
+WALDRON, Edwin W.; 1985; Flight Attnd., United Airlines; r: 555 NW 94th St., Miami Shrs., FL 33150;
WALDRON, Heidi L. (Heidi L. Scheeringa); 1980; r: POB 1234, Helen, GA 30545
Thanks again for your help!

To me it looks like it could be done by the fact you have the semi-colons and the r: which could be used for parsing.

Perry
 
Upvote 0
Perry,

Thanks for the reply. I was hoping that the r: could be used for some type of positioning/parsing item. I just don't have the expertise or know-how to proceed. Does anybody have any suggestions? I appreciate the help!
 
Upvote 0
Joe_Toz said:
Perry,

Thanks for the reply. I was hoping that the r: could be used for some type of positioning/parsing item. I just don't have the expertise or know-how to proceed. Does anybody have any suggestions? I appreciate the help!

In the case of the place of business as indicated by the airlines in the example, is this part of the date or none related?

To me the first couple of fields look fairly easy, then the r: looks like it should not be a problem, but I think before anyone can come up with a solution to this (which I think may end up using find), they need the relationships of fields such as this.

Perry
 
Upvote 0
Perry,

Thanks again for the reply. The place of business is not related to the date. The business information is irrelevant and can be disregarded. The date, however, is something that needs to make the extract.
Thanks again for helping. Any and all suggestions are much appreciated!
 
Upvote 0
Hello Joe,

I've been doing a little work on your problem using the CSV file snippet!

It does not extract everything, the original is too messy, but it should
give you a starting point.

1) Start Excel and open your CSV file.
2) Name a second worksheet as dbClean with columns:
indicator surname name date address city state zip phone (Columns A - I)
3) Copy and Paste the code into a new module (macro)
4) Make sure that the CSV worksheet is the active one
5) Run the macro

Note: you can only run the macro once. If you want to run it again, you need
replace the CSV sheet with the original first.

Let me know if I can help further !

Ciao,

Don

Code:
Sub stripItems()

Dim pntr
Dim tmpStr
Dim tmpStr1
Dim pntr1
Dim cnt
Dim numberOfRows

Dim frm As String
Dim cntr

numberOfRows = 19

'insert a column before column A on CSV sheet
ActiveSheet.Columns(1).Select
Selection.Insert Shift:=xlToRight
Range("B1").Select

For cntr = 1 To 19

'extract + sign if present and copy to new column A
    If IsError(Range("B" & cntr)) Then
        frm = Range("B" & cntr).Formula
        Range("A" & cntr).Value = Mid(frm, 2, 1)
        Range("B" & cntr) = Mid(frm, 3)
    End If
'extract #- if present
    If IsNumeric(Left(Range("B" & cntr), 1)) Then
        frm = Range("B" & cntr).Formula
        Range("A" & cntr).Value = Mid(frm, 1, 2)
        Range("B" & cntr) = Mid(frm, 3)
    End If

Next cntr


For cnt = 1 To numberOfRows

'find 1st semicolon in C and set pointer pntr
pntr = InStr(1, Range("C" & cnt), ";")

'copy string upto pointer-1 to sheet dbClean C ==> name
tmpStr = Left(Range("C" & cnt), pntr - 1)
Range("dbClean!C" & cnt + 1) = tmpStr

'find 2nd semicolon in C and set pointer1
pntr1 = InStr(pntr + 1, Range("C" & cnt), ";")
If pntr1 = 0 Then
GoTo nxt
End If

'copy string between pointers pntr and pntr1 to sheet dbClean D ==> date
tmpStr = Mid(Range("C" & cnt), pntr + 1, (pntr1 - 1 - pntr))
tmpStr = Trim(tmpStr)
Range("dbClean!D" & cnt + 1) = tmpStr

'find 1st colon in C and set pointer pntr
pntr = InStr(1, Range("C" & cnt), ":")
tmpStr = Mid(Range("C" & cnt), pntr + 1)
tmpStr = Trim(tmpStr)

'if ":" exists here then col D to col F in dbClean ==> city
If pntr <> 0 Then
Range("dbClean!F" & cnt + 1) = Trim(Range("D" & cnt))
'==> state
tmpStr1 = LTrim(Range("E" & cnt))
Range("dbClean!G" & cnt + 1) = Left(tmpStr1, 2)
'==> zip
tmpStr1 = RTrim(Range("E" & cnt))
Range("dbClean!H" & cnt + 1) = Right(tmpStr1, 5)
'==> phone
tmpStr1 = LTrim(Range("F" & cnt))
Range("dbClean!I" & cnt + 1) = Left(tmpStr1, 12)
End If

'if no part of address in C then empty tmpStr
If InStr(1, tmpStr, ";") <> 0 Then
tmpStr = ""
Else
Range("dbClean!E" & cnt + 1) = tmpStr
End If
nxt:
Next cnt

For cnt = 1 To numberOfRows
'find 1st colon in D and set pointer pntr
pntr = InStr(1, Range("D" & cnt), ":")
If pntr = 0 Then
GoTo nx
End If
tmpStr = Mid(Range("D" & cnt), pntr + 1)
tmpStr = Trim(tmpStr)
Range("dbClean!E" & cnt + 1) = tmpStr
nx:
'copy indicator and surname to dbClean sheet
Range("dbClean!A" & cnt + 1) = Range("A" & cnt)
Range("dbClean!B" & cnt + 1) = Range("B" & cnt)

Next cnt

End Sub
 
Upvote 0

Forum statistics

Threads
1,223,227
Messages
6,170,849
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top