How to use regexp and arrays to match groups or sets of 3 individual patterns?

jbaich

Board Regular
Joined
Nov 2, 2011
Messages
140
Sorry if this get's duplicated, tried to post, but wasn't logged in and then auto redirect didn't go anywhere, so went back and trying again...

Hey all, I'm new to Reg exp and I'm going to try to use it in Outlook... I know this is an Excel forum, but there are so many super smart and helpful people here I figured it might be worth asking.

I'm trying to match patterns, which occur in sets of 3. I have posted this question on another forum as well, where i was originally introduced to the Reg Exp function https://forums.slipstick.com/threads/95172-is-this-possible-to-do-with-a-macro/#post-347604, but with the holiday season, i'm not sure if I should be expecting a response in the short term and the usefulness of this macro (if I can get it to work) is increasing exponentially by the day, so I'd really love to figure something out in a hurry :)

So what I'm thinking is as follows, please correct me if I'm wrong:

example:
Pattern 1 = date
Pattern 2 = time
pattern 3 = postal code

The 3 individual patterns are SubMatches and together they would be a MatchCollection... right? Except instead of finding all the pattern 1 collections, all the pattern 2 collections and all the pattern 3 collections, I need to find all the pattern (1,2,3) collections... and store them for use later so i also need to create variables for them as they are identified... ie: collection 1, collection 2, collection 3... etc.

So I'm trying to figure out how to loop through and find all of the MatchCollections (1,2,3,) in my document... not sure how many there may be. I was thinking I could run a count of all the pattern 1 matches, which should give me the number of collections I'll need and then do an if loop for i = 1 to count... but I'm not sure how to group them... I've tried working with Case statements, but can only get as far as;

1. Having to create duplicate statements for each estimated recurrence (for example 3 match collections would require case 1-9 statements)... is there a way to just have the 3 case statements and then repeat the loop?

2. If I set Global=True, I will get all of the matches, grouped together as (1,1,1),(2,2,2),(3,3,3), however what I need to get is (1,2,3),(1,2,3),(1,2,3)

I'm guessing i need to incorporate arrays here somehow, but I've never really used arrays in VBA before and so far haven't been able to figure out what I need to do. Something like
Code:
[LIST]
[*]<article>[INDENT]Dim arr
 arr = Array("Collection 1","Collection 2","Collection 3")

[/INDENT]
</article>
[/LIST]

My code at this point is simply a patchwork of random failed attempts and commented out bits and pieces that may or may not be on the right track. As I said, new to both Reg and Arrays in VBA, so i think posting it might be more confusing than helpful, but if you think that would help, let me know and I will.

Any help would be very much appreciated! Also, i don't know the etiquette for cross posting, I know it's not the best and should be avoided to prevent duplication of effort, is this what Trackback is for? I'm guessing yes, so I've put the URL of the site I mentioned above in the trackback field here and hopefully that helps... I will also let Diane know over on Slipstick that I've asked this question here as well.

Thanks,
Joe
 
Thanks Domenic, you're suggestions above are perfect! everything is working great and I really appreciate all the time you've spent on this.

I hesitate to ask any more questions and take up any more of your time, but perhaps you could take a quick peek at my situation below and just point me in the right direction?

I noticed this morning while testing that the field for Roll number is free form so I'm seeing some variations in how that value is entered into the form... There appear to be 2 common variations and I was hoping I might be able to account for them, but I'm not sure if it makes more sense to do a Select Case set up with the current pattern as Case 1 and the two variations as Case 2 and 3 or to try to capture the variations in the single pattern statment with | symbols?


Here are the 3 most common variations for this example and my unsuccessful attempts at pattern modification to account for them:


Case 1 (current pattern) works...
Code:
sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll Number[:]\s*\r\n(\w*)"
[TABLE="width: 680"]
<tbody>[TR]
[TD]Area:
[/TD]
[TD]22
[/TD]
[/TR]
[TR]
[TD]Jurisdiction:
[/TD]
[TD]205
[/TD]
[/TR]
[TR]
[TD]Roll number:
[/TD]
[TD]08466870
[/TD]
[/TR]
</tbody>[/TABLE]



Case 2 (decimal point at 4th position from the end) (\w*\.\d*) does not work either... added\s* after bracket and still no..
Code:
sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll Number[:]\s*\r\n(\w*[B]\.\d{3})\s*[/B]"
I'm really suprised (\w*\.\d*) doesn't work...everything I've read seems to suggest it should... now I'm worried there's a larger issue like formatting at play as up to this point I've only been able to test on emails I've forwarded to myself and altered the data in.
[TABLE="width: 680"]
<tbody>[TR]
[TD]Area:
[/TD]
[TD]22
[/TD]
[/TR]
[TR]
[TD]Jurisdiction:
[/TD]
[TD]205
[/TD]
[/TR]
[TR]
[TD]Roll number:
[/TD]
[TD]08466.870
[/TD]
[/TR]
</tbody>[/TABLE]


Would something like below be on the right track?
Code:
If Case 2, oMatch.Submatches(2) = Substitute(oMatch.Submatches(2),".","")


Case 3 (The Roll number is keyed as the concatenation of the three values) not sure if greater than is an option... maybe {9-15}?
Code:
sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll Number[:]\s*\r\n[B](\w{>9})[/B]"
[TABLE="width: 680"]
<tbody>[TR]
[TD]Area:
[/TD]
[TD]22
[/TD]
[/TR]
[TR]
[TD]Jurisdiction:
[/TD]
[TD]205
[/TD]
[/TR]
[TR]
[TD]Roll number:
[/TD]
[TD]2220508466870
[/TD]
[/TR]
</tbody>[/TABLE]


Would something like below be on the right track?
Code:
If Case 3, oMatch.Submatches(2) = Right(oMatch.Submatches(2),Len(oMatch.Submatches(2)-5)

Or would I be better off to ammend the current pattern to something like ...
Code:
sPattern = "...\nRoll Number[:] [B]|[/B] \s*\r\n(\w*) [B]| (\w*\.\d{3}) |[/B] [B](\w{>9})[/B]"

Thanks,
joe
 
Upvote 0

Excel Facts

How can you turn a range sideways?
Copy the range. Select a blank cell. Right-click, Paste Special, then choose Transpose.
Just a quick update: I seem to have narrowed it down to "Roll number" being the culprit for not wanting to match many of my pattern attempts... FYI for anyone who may stumble across this thread and get all the way to this page... found a great expression builder tool here! Regular Expression Tester

When I test "Roll" i get matches and when i test
Code:
"number[:]\s*\r\n(\w*\.\d*)"
i get matches, but when i test
Code:
"Roll number[:]\s*\r\n(\w*\.\d*)"
i get no matches... must be the space in between... so it looks like I need [Roll number:]\s*\r\n(\w*\.\d{3}) :)
 
Upvote 0
update: nope.... that doesn't work... i don't get it :(
Can't match up the "Roll number:" part for some reason and I can't seem to figure out why... What the heck???
 
Upvote 0
I'm glad to hear that you've got the code working. That's great.

For your pattern, try...

Code:
sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll Number[:]\s*\r\n[B][COLOR=#ff0000]((\w+(?:\.\d+)?))?\s*\r\n[/COLOR][/B]"

For example, for the roll number it should match...

Code:
    empty string
    0
    123456789
    123456789.123
    A123456789.123456789
    123b46789.123456789

For example, for the roll number it should reject...

Code:
    .2
    .245
    .2456789
    .245.345
    1234.
    245.345.789
    A123456789.123C

Note that if the roll number is missing and you try adding the match to a collection, you'll get an error. That's because the roll number is used as the key, and you can't use an empty variable to assign the key.

Hope this helps!
 
Last edited:
Upvote 0
For example, for the roll number it should match...

Code:
empty string
0
123456789
123456789.123
A123456789.123456789
123b46789.123456789

Actually, I think that should be blank cell, not empty string...
 
Upvote 0
Thanks Domenic, so here's something weird... Yesterday the macro was working perfectly on email from my inbox, however we have a "general" inbox that the emails come into first. When we try running the macro directly on a message from the "general" inbox, no pattern matches are found, however when we forward that message to ourselves (personal inbox) with no changes at all and run the macro, it works...? I've been trying to figure it out and as far as I can tell the macro is running as it should, the variables are being defined correctly... It seems like it's just not finding any pattern matches, even though it's exactly the same message...???

Ever heard of anything like this?

Thanks,
Joe
 
Upvote 0
BTW here's my current version... I've added a little piece to save a copy of the message in .pdf as this was a request I received from colleagues yesterday.
Code:
Public strSubject As String, strFolderPath As String, strNewFolderPath As String, FileRef As String, strInList As StringPublic olMail As Object
Public VarNewFolderPath As Variant
Option Explicit
'For this Macro you will need to turn on the Microsoft Forms 2.0 Object Library, Microsoft Word 12.0 Objects Library as well as the Microsoft VBScript Regular Expressions 5.5 library


Sub MsgChecker()


    'Outlook variables
    Dim olApp                       As Object
    Dim olMail                      As MailItem
    Dim sPattern                    As String
    Dim testSubject                 As String
    Dim answer                      As Integer
    
    'VBScript variables
    Dim oRegExp                     As Object
    Dim oMatchCollection            As Object
    Dim oSubMatches                 As Object
    Dim oMatch                      As Object
        
    'Excel variables
    Dim oCol                        As New Collection


    Set olApp = CreateObject("Outlook.Application")
    Set olMail = olApp.ActiveExplorer.Selection.Item(1)
    
    sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll.number[:]\s*\r\n((\w+(?:\.\d+)?))?\s*\r\n"


    Set oRegExp = CreateObject("VBScript.RegExp")
    
    With oRegExp
        .Global = True
        .IgnoreCase = True
        .Pattern = sPattern
        Set oMatchCollection = .Execute(olMail.Body)
    End With
    
    For Each oMatch In oMatchCollection
        oCol.Add oMatch, oMatch.SubMatches(2)
    Next oMatch
    Debug.Print oMatchCollection.Count
    For Each oMatch In oCol
        Debug.Print oMatch.SubMatches(0) & oMatch.SubMatches(1) & oMatch.SubMatches(2)
        strSubject = oMatch.SubMatches(0) & oMatch.SubMatches(1) & Replace(oMatch.SubMatches(2), ".", "")
        strSubject = Replace(strSubject, Chr(13), "")
        If Len(strSubject) > 14 Then
        strSubject = Replace(oMatch.SubMatches(2), ".", "")
        End If
        Debug.Print strSubject
        testSubject = testSubject & Trim(strSubject) & ","
        
        Debug.Print testSubject
        strInList = "IN " & testSubject
    Next oMatch
        Debug.Print strInList
        
        If strInList = "" Then
        msgbox ("No references found, please ensure you have selected the correct message from your inbox")
        Exit Sub
        End If


answer = msgbox("I have found the following " & oMatchCollection.Count & " references in this email: " & _
strInList & vbNewLine & "Select Ok to copy In List to clipboard and continue" & _
vbNewLine & "Select Cancel if these results are incorrect", vbOKCancel)
    
    If answer = vbOK Then
    Copy
        
    Else
    Exit Sub


End If
msgbox ("To create folders for these files, you will need to know the primary actual use (ie: Res, Farm, ICI, Strata) for each. Click Ok when you have determined the correct uses to continue.")


    For Each oMatch In oMatchCollection
    If Len(oMatch.SubMatches(2)) >= 13 Then
        FileRef = oMatch.SubMatches(1) & " - " & Right(Replace(oMatch.SubMatches(2), ".", ""), Len(oMatch.SubMatches(2)) - 5)
        Else
    FileRef = oMatch.SubMatches(1) & " - " & Replace(oMatch.SubMatches(2), ".", "")
        End If
      
        SaveMessageAsMsg
    Next


msgbox ("Finished!")




Set oRegExp = Nothing


End Sub


Public Sub Copy()
    
    Dim clipboard As MSForms.DataObject
    'Dim strSample As String


    Set clipboard = New MSForms.DataObject
    'strSample = "This is a sample string"
    
    clipboard.SetText strInList
    clipboard.PutInClipboard
    msgbox ("The InList has been copied to your clipboard. Please paste into the Property ID field of the Prop Tree Tab to generate RS")
    


End Sub
Public Sub SaveMessageAsMsg()
  Dim oMail As Outlook.MailItem
  Dim objItem As Object
  Dim objInspector As Object
  Dim objDoc As Object
  Dim sPath As String
  Dim sName As String
  Dim VarNewFolderPath As Variant
  Dim varFolderPath As Variant
    
'''''''''''Change the path below to suite your region'''''''''''''''''''''''''
  varFolderPath = "\\somefolder\anotherfolder\"
  
  VarNewFolderPath = Pick(varFolderPath)


    If VarNewFolderPath = "Exit" Then Exit Sub


On Error Resume Next
MkDir (VarNewFolderPath & "\" & FileRef)
On Error GoTo 0


   For Each objItem In ActiveExplorer.Selection
   If objItem.MessageClass = "IPM.Note" Then
    Set oMail = objItem
    Set objInspector = objItem.GetInspector
    Set objDoc = objInspector.WordEditor
  sName = oMail.Subject
  ReplaceCharsForFileName sName, "-"
  
  'sPath = "\\somefolder\anotherfolder\" & FileRef & "\"


  sPath = VarNewFolderPath & "\" & FileRef & "\"
  Debug.Print sPath & sName
  oMail.SaveAs sPath & sName & ".msg", olMSG
  objDoc.ExportAsFixedFormat sPath & sName & ".pdf", 17
  
  Set objInspector = Nothing
  Set objDoc = Nothing
  End If
  Next
   
End Sub
Private Sub ReplaceCharsForFileName(sName As String, _
  sChr As String _
)
  sName = Replace(sName, "'", sChr)
  sName = Replace(sName, "*", sChr)
  sName = Replace(sName, "/", sChr)
  sName = Replace(sName, "\", sChr)
  sName = Replace(sName, ":", sChr)
  sName = Replace(sName, "?", sChr)
  sName = Replace(sName, Chr(34), sChr)
  sName = Replace(sName, "<", sChr)
  sName = Replace(sName, ">", sChr)
  sName = Replace(sName, "|", sChr)
End Sub


Function BrowseForFolder(Optional OpenAt As Variant) As Variant
 Dim ShellApp As Object
 Set ShellApp = CreateObject("Shell.Application"). _
 BrowseForFolder(0, "Please choose a folder for " & FileRef, 0, OpenAt)
 Debug.Print OpenAt
 On Error Resume Next
    BrowseForFolder = ShellApp.Self.Path
 Debug.Print BrowseForFolder
VarNewFolderPath = BrowseForFolder
 On Error GoTo 0


 Set ShellApp = Nothing
    Select Case Mid(BrowseForFolder, 2, 1)
        Case Is = ":"
            If Left(BrowseForFolder, 1) = ":" Then GoTo Invalid
        Case Is = "\"
            If Not Left(BrowseForFolder, 1) = "\" Then GoTo Invalid
        Case Else
            GoTo Invalid
    End Select
 Exit Function


Invalid:
 BrowseForFolder = False
End Function


Function Pick(ByVal varFolderPath As Variant) As String




    Dim answer                      As Integer
    Dim VarNewFolderPath            As Variant
    
VarNewFolderPath = BrowseForFolder(varFolderPath)


If VarNewFolderPath = False Then
        msgbox FileRef & " will be skipped", vbInformation 'optional
    Pick = "Exit"
    Exit Function
    End If


    answer = msgbox("A copy of this message will be saved in a new folder named " & FileRef & " that will be created in: " & VarNewFolderPath & "\" & vbNewLine & "If a folder with this name already exists at this location, a new folder will NOT be created.", vbOKCancel)
    If answer = vbCancel Then
    Pick = "Exit"
    Exit Function
    End If
    
    Pick = VarNewFolderPath
End Function

They've also asked if it's possible to save copies of any attachments in the folders, which i think can be done, I've seen some examples out there I think I can work off of, but haven't had time yet to experiment with that part... The biggest mystery is the macro only working on emails that are forwarded....

2 general questions: It seems that changing the size of the message window affects the layout of the table in the body... small msg window condenses table making some labels like "Roll number" appear on two lines like:
Roll
Number

does this impact the pattern's ability to recognise it?

the other question... should I be setting all objects = nothing at the end of macros that use objects? Is this something that is considered good practice? I see some people do it and some don't...

Thanks,
Joe
 
Last edited by a moderator:
Upvote 0
When we try running the macro directly on a message from the "general" inbox, no pattern matches are found

I'm assuming that the you have your general inbox selected, and that you have the target email selected, correct? If so, try stepping through the code by pressing the F8 key, and stop after the following line of code is executed...

Code:
Set olMail = olApp.ActiveExplorer.Selection.Item(1)

Then, type the following line in the Immediate Window...

Code:
? olMail.Body

1) Do you get the text from the expected email?

2) Is it in the same format?
 
Upvote 0
I'm assuming that the you have your general inbox selected, and that you have the target email selected, correct? If so, try stepping through the code by pressing the F8 key, and stop after the following line of code is executed...

Correct... ?olMail.Body does yield the text from the expected email in the immediate window and I thought it was formatted as expected... but it does display differently in the immediate window depending on the inbox or forwarded vs not forwarded....

So the pattern that works on the forwarded emails is
Code:
sPattern = "Area[:]\s*\r\n(\d*)\s*\r\nJurisdiction[:]\s*\r\n(\d*)\s*\r\nRoll.number[:]\s*\r\n((\w+(?:\.\d+)?))?\s*\r\n"

looks like this in the immediate window from General Inbox...
Area: 21
Jurisdiction: 413
Roll number: 05169520

looks like this in the immediate window after being forwarded to a personal inbox... (sPattern works on this)
Area:


21


Jurisdiction:


413


Roll number:


05169520

The code appears to be running properly as it's not throwing an error, rather it's giving me the msgbox that it's supposed to give me if it doesn't find any matches. So it must be that the act of forwarding the email changes the formatting of the table in the msg body?

I've been trying to rework the pattern based on the RegExp builder I found on Regular Expression Tester and according to that, the following should work... but it doesn't get any matches either... Is there a wildcard character for \r that would work for either scenario?
Code:
Area[:]\s*(\d*)\s*\r\nJurisdiction[:]\s*(\d*)\s*\r\nRoll.number[:]\s*((\w+(?:\.\d+)?))?\s*\r\n

Thanks,
Joe
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,227
Messages
6,170,848
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top