extracting information from a text using vba

absan

New Member
Joined
Nov 3, 2014
Messages
16
i have text file i need to take the address, account number (start with 200-xxxx-xxxx) and name from it and insert it into excel file.
that why im having problem with, looks like this i need to extract the bold information

""1615 MISSISSAUGA ROAD, ONW" 500-4 LT A/A 0 1 1 15 0 N Y 00000000 200-0000-0001 customer one
Attn Codes 0
HDW 409 DSR2 2.14459E+15 2-way SRVC: 154(154)

"L-191 COPELAND RIVER LAKE RD, O" NW 500-4 LT A/A 0 1 1 15 0 N Y 333-9494 200-0000-0002customer two
Attn Codes 37 0
HDW 435 NAV3 6.24472E+14 2-way SRVC: 154(154)

"86 WHITE LAKE DRIVE AD, ONW" 500-2 LT C/C 0 1 0 15 0 N N 00000000 200-0000-0003 customer three
Attn Codes 30.85
"2244 HWY, ONW" 500-2 LT C/C 0 1 0 1 0 N N 00000000 200-0000-0004 customer four "

please advice thanks
 
Hi ED,

i have add the botton to the sheet and when i call GetCustInfo:

I'm getting Complile error: User-defined type not defined "regEx As New RegExp" should i have to enable some a certain reference?

thanks,
 
Upvote 0

Excel Facts

Excel Can Read to You
Customize Quick Access Toolbar. From All Commands, add Speak Cells or Speak Cells on Enter to QAT. Select cells. Press Speak Cells.
Here's what I've got, absan. Copy everything from Sub to End Sub and paste it into your module. Then in your _Click code or other sub, use Call GetCustInfo.

Code:
Sub GetCustInfo()
Dim wkb As Workbook
Dim wks As Worksheet
Dim regEx As New RegExp
Dim Match, matches, s
Dim cntRow As Long, posChar As Long
Dim strFile As String, strText As String
Dim strCust As String, strAdd As String, strAcct As String
Dim strRegEx As String
'Set workbook, worksheet objects
Set wkb = ActiveWorkbook
Set wks = ActiveSheet
'Initiate row count at 2 (assuming headers in row 1)
cntRow = 2
'Assume all account numbers are formatted
'123-1234-1234
'Uses Regular Expressions for pattern matching
'Set a reference under Tools >> References to
'Microsoft VBScript Regular Expressions 5.5
strRegEx = "([0-9]{3})(\-)([0-9]{4})(\-)([0-9]{4})"
'Set up RegExp for match
With regEx
  .Global = True
  .MultiLine = True
  .IgnoreCase = False
  .Pattern = strRegEx
End With
'Navigate to and open data file
strFile = Application.GetOpenFilename()
'Open file and read each line
Open strFile For Input As #1
Do Until EOF(1)
  Line Input #1, strText  'strText is one line of data
  'Print line in Immediate window (CTRL+G)
  Debug.Print strText
  
  'Check text for validity
  If regEx.test(strText) = True Then
    'RegExp matched the pattern; data is valid
    'Get the match
    Set matches = regEx.Execute(strText)
    For Each Match In matches
      'Gets character position of regEx match
      s = Match.FirstIndex
    Next Match
    
    'Extract account number; use the "s" variable
    strAcct = Mid(strText, CLng(s), 13)
    'Clean of leading, ending spaces
    strAcct = Trim(strAcct)
    Debug.Print strAcct
    
    'Extract name; use the "s" variable + 13 characters
    strCust = Right(strText, Len(strText) - (CLng(s) + 13))
    'Clean of leading, ending spaces
    strCust = Trim(strCust)
    Debug.Print strCust
    
    'Extract address; find " 500-" after address
    posChar = InStr(1, strText, " 500-")
    strAdd = Left(strText, posChar)
    'Clean out quotes
    strAdd = Replace(strAdd, Chr(34), "")
    'Clean of leading, ending spaces
    strAdd = Trim(strAdd)
    Debug.Print strAdd
    
    'Write to worksheet
    'Name to col A
    wks.Cells(cntRow, 1) = strCust
    'Account to col B
    wks.Cells(cntRow, 2) = strAcct
    'Address to col C
    wks.Cells(cntRow, 3) = strAdd
    
    'Increment row
    cntRow = cntRow + 1
    
  End If
  
Loop
Close #1
'Clear objects
Set regEx = Nothing
Set wks = Nothing
Set wkb = Nothing
MsgBox "Done!"
End Sub
Hi ED,

i have add the botton to the sheet and when i call GetCustInfo:

I'm getting Compile error: User-defined type not defined "regEx As New RegExp" should i have to enable some a certain reference?

thanks,
 
Upvote 0
Yes - I made that comment in the code.
'Assume all account numbers are formatted
'123-1234-1234
'Uses Regular Expressions for pattern matching
'Set a reference under Tools >> References to
'Microsoft VBScript Regular Expressions 5.5
 
Upvote 0
Hi ED,

i have add the botton to the sheet and when i call GetCustInfo:

I'm getting Compile error: User-defined type not defined "regEx As New RegExp" should i have to enable some a certain reference?

thanks,

Hi Ed thanks a lot its working now, but i have question just for me to understand, if you dont mind:
1. why we are using regEx
2. i did not understood these:
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strRegEx
3. the name portion of the code what is the S for?
 
Upvote 0
1. why we are using regEx
2. i did not understood these:
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strRegEx
3. the name portion of the code what is the S for?
No problem asking questions. This is exactly how I learned.

1. I'm using Regular Expressions (RegEx) because it allows you to match a pattern within a string of text. That allows us to validate a line of text captured from the text file by trying to match the pattern of the account number.

2. Those are properties (I think!!) of the RegEx object. We set them as we need to make our pattern match work.

3. I copied this code off another forum -- I could have declared the variables with better names, but I was wrapped up in getting this to work. Matches, match, and s are just names of variables used with RegEx. They are not declared (Dim) with "As", so they are automatically Variant. A string is to contain text, long and integer are for numbers, and there's a lot more. A variant, though, can be anything - a number, text, or even an array. Sometimes you want that because you don't know what you're going to get: if the pattern match works, match will hold a text string and s will be the number of the character position of the start of the pattern. But if the pattern match fails, those could be null strings or something else that you can't predict and can't use. So they are variants, and I can use them like a basket to toss anything into.

Do some searching on regular expression and variable types to learn more.
 
Upvote 0
For those who might be interested, here is a way to do what the OP wants without using Regular Expressions (not tested, but I am thinking this might be faster)... I commented nearly every statement to make the logic easier to understand.
Code:
Sub GetCustInfo()
  Dim X As Long, Z As Long, Index As Long, FileNum As Long, Info As Variant
  Dim TotalFile As String, PathAndFileName As String, Lines() As String
  [COLOR=#008000]' Get the path and filename from the user[/COLOR]
  PathAndFileName = Application.GetOpenFilename()
  [COLOR=#008000]' Get an unused file channel number[/COLOR]
  FileNum = FreeFile
  [COLOR=#008000]' The next four statements read the entire file into the TotalFile string variable[/COLOR]
  Open PathAndFileName For Binary As #FileNum
    TotalFile = Space(LOF(FileNum))
    Get #FileNum, , TotalFile
  Close #FileNum
  [COLOR=#008000]' Split the entire file into its individual lines[/COLOR]
  Lines = Split(TotalFile, vbNewLine)
  [COLOR=#008000]' Create the empty output array[/COLOR]
  ReDim Info(1 To UBound(Lines) + 1, 1 To 3)
  [COLOR=#008000]' Loop through the individual lines[/COLOR]
  For X = 0 To UBound(Lines)
    [COLOR=#008000]' If the line contains the dashed number pattern, process it[/COLOR]
    If Lines(X) Like "*###-####-####*" Then
      [COLOR=#008000]' Increase the index used to write to the output file[/COLOR]
      Index = Index + 1
      [COLOR=#008000]' Loop backwards through the line (looking for the number pattern)[/COLOR]
      For Z = Len(Lines(X)) - 13 To 1 Step -1
        [COLOR=#008000]' Test each 13 characters to see if it is the number pattern[/COLOR]
        If Mid(Lines(X), Z, 13) Like "###-####-####" Then
          [COLOR=#008000]' Assign the number pattern to the middle output file element[/COLOR]
          Info(Index, 2) = Mid(Lines(X), Z, 13)
          [COLOR=#008000]' Exit loop and process the rest of the found line[/COLOR]
          Exit For
        End If
      Next
      [COLOR=#008000]' Assign the customer name to the first output file element[/COLOR]
      Info(Index, 1) = Trim(Mid(Lines(X), Z + 13))
      [COLOR=#008000]' Isolate the address from the line and assign it to the third output file element[/COLOR]
      Lines(X) = Trim(Lines(X))
      Do While Left(Lines(X), 1) = """"
        Lines(X) = Mid(Lines(X), 2)
      Loop
      Info(Index, 3) = Left(Lines(X), InStr(Lines(X), """") - 1)
    End If
  Next
  [COLOR=#008000]' Write the customer info to the active worksheet[/COLOR]
  Range("A1").Resize(Index, 3) = Info
  MsgBox "Done!"
End Sub
 
Upvote 0
No problem asking questions. This is exactly how I learned.

1. I'm using Regular Expressions (RegEx) because it allows you to match a pattern within a string of text. That allows us to validate a line of text captured from the text file by trying to match the pattern of the account number.

2. Those are properties (I think!!) of the RegEx object. We set them as we need to make our pattern match work.

3. I copied this code off another forum -- I could have declared the variables with better names, but I was wrapped up in getting this to work. Matches, match, and s are just names of variables used with RegEx. They are not declared (Dim) with "As", so they are automatically Variant. A string is to contain text, long and integer are for numbers, and there's a lot more. A variant, though, can be anything - a number, text, or even an array. Sometimes you want that because you don't know what you're going to get: if the pattern match works, match will hold a text string and s will be the number of the character position of the start of the pattern. But if the pattern match fails, those could be null strings or something else that you can't predict and can't use. So they are variants, and I can use them like a basket to toss anything into.

Do some searching on regular expression and variable types to learn more.

Hi Ed thanks a lot for the explanation about the address when i tested it on the real file it skipping lots of them. is the code finding the qouts before the 500-? or is pointed to a location to search for it? like what is strAdd = Replace(strAdd, Chr(34), "")

thanks,
 
Upvote 0
For those who might be interested, here is a way to do what the OP wants without using Regular Expressions (not tested, but I am thinking this might be faster)... I commented nearly every statement to make the logic easier to understand.
Code:
Sub GetCustInfo()
  Dim X As Long, Z As Long, Index As Long, FileNum As Long, Info As Variant
  Dim TotalFile As String, PathAndFileName As String, Lines() As String
  [COLOR=#008000]' Get the path and filename from the user[/COLOR]
  PathAndFileName = Application.GetOpenFilename()
  [COLOR=#008000]' Get an unused file channel number[/COLOR]
  FileNum = FreeFile
  [COLOR=#008000]' The next four statements read the entire file into the TotalFile string variable[/COLOR]
  Open PathAndFileName For Binary As #FileNum
    TotalFile = Space(LOF(FileNum))
    Get #FileNum, , TotalFile
  Close #FileNum
  [COLOR=#008000]' Split the entire file into its individual lines[/COLOR]
  Lines = Split(TotalFile, vbNewLine)
  [COLOR=#008000]' Create the empty output array[/COLOR]
  ReDim Info(1 To UBound(Lines) + 1, 1 To 3)
  [COLOR=#008000]' Loop through the individual lines[/COLOR]
  For X = 0 To UBound(Lines)
    [COLOR=#008000]' If the line contains the dashed number pattern, process it[/COLOR]
    If Lines(X) Like "*###-####-####*" Then
      [COLOR=#008000]' Increase the index used to write to the output file[/COLOR]
      Index = Index + 1
      [COLOR=#008000]' Loop backwards through the line (looking for the number pattern)[/COLOR]
      For Z = Len(Lines(X)) - 13 To 1 Step -1
        [COLOR=#008000]' Test each 13 characters to see if it is the number pattern[/COLOR]
        If Mid(Lines(X), Z, 13) Like "###-####-####" Then
          [COLOR=#008000]' Assign the number pattern to the middle output file element[/COLOR]
          Info(Index, 2) = Mid(Lines(X), Z, 13)
          [COLOR=#008000]' Exit loop and process the rest of the found line[/COLOR]
          Exit For
        End If
      Next
      [COLOR=#008000]' Assign the customer name to the first output file element[/COLOR]
      Info(Index, 1) = Trim(Mid(Lines(X), Z + 13))
      [COLOR=#008000]' Isolate the address from the line and assign it to the third output file element[/COLOR]
      Lines(X) = Trim(Lines(X))
      Do While Left(Lines(X), 1) = """"
        Lines(X) = Mid(Lines(X), 2)
      Loop
      Info(Index, 3) = Left(Lines(X), InStr(Lines(X), """") - 1)
    End If
  Next
  [COLOR=#008000]' Write the customer info to the active worksheet[/COLOR]
  Range("A1").Resize(Index, 3) = Info
  MsgBox "Done!"
End Sub

Hi Rick im getting invalid procedure call or argument on "Info(Index, 3) = Left(Lines(X), InStr(Lines(X), """") - 1)"

thanks,
 
Upvote 0
Hi Rick im getting invalid procedure call or argument on "Info(Index, 3) = Left(Lines(X), InStr(Lines(X), """") - 1)"

thanks,

Hi Rick now its working, it should be Info(Index, 3) = Left(Lines(X), InStr(Lines(X), "") - 1) singel qouts

thanks,
 
Upvote 0
Hi Rick now its working, it should be Info(Index, 3) = Left(Lines(X), InStr(Lines(X), "") - 1) singel qouts
:confused: No, I meant what I posted... 4 quote marks because you showed the address encased in quote marks (in Message #1)... your data in the text file does look like what you posted in Message #1 (at least one quote mark before the address and one mark after it), correct? If so, then what you posted above would omit the address from the output.
 
Upvote 0

Forum statistics

Threads
1,223,910
Messages
6,175,320
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top