Regular Expression pattern string help needed

goblin

Active Member
Joined
Apr 16, 2003
Messages
469
I've exposed the VBA regular expression mechanism as a function in Excel and I'm trying to match/break up some data. The data looks like this:

0301-123456 Some name 111111-1111 1,202,591.00 .5 .5 5 Counterparty name
0301-234567 Another name 222222-2222 1,230,000.00 30.04.2013 1.0 .5 8 Counterparty name2
0301-345678 Any name you want 333333-3333 1,230,000.00 29.04.2013 30.04.2013 1.0 .5 8 Counterparty nam3
0301-456789 No digits in names I think 444444-4444 750,000.00 01.10.2012 .0 .5 1 Counterparty name4
0301-567890 NameNameName 555555-5555 750,000.00 31.12.2013 1.0 .5 5 Counterparty name5
0301-678901 This is my name 666666-6666 14,176,426.00 02.01.2015 .0 .5 1 Counterparty name6

The data can be described as:
1. The lines will always start with ddd-dddd
2. Then comes a name
3. ID code for the name on the form dddddd-dddd
4. Amount is next, formatted as #,##0.00
5. Optional column that contains a date, formated as dd.mm.yyyy. If 5 is populated 6 is populated as well
6. Optional column that contains a date, formated as dd.mm.yyyy. Can be populated without having data in column 5
7. Fraction, formatted as #.0 (e.g. 1.0 or .5)
8. Fraction. Same as 7
9. Integer. Can be two digits
10. Another name and can contain essentially anything

I've almost matched this but I cannot turn off greedy matching on columns 5 and 6. I.e. if column 5 is empty but column 6 is not, my pattern matching always puts the date from column 6 into column 5.

My pattern string is:
^(\d*-\d*)+\s(.*)\s(\d*-\d*)+\s(\S*)\s(\d\d\S*\s|)(\d\d\S*\s|)\s?(\d?\.\d+)\s(\d?\.\d+)\s(\d+)\s(.*)$

Basically, I'm matching like this:
1. Match the digit group for 1 and 3. Anything in between goes as group 2.
2. Match the amount in 4 as anything non-space
3. Match EITHER anything starting with two digits or skip it, still reserving a group for this.
4. Again, match EITHER anything starting with two digits or nothing, reserving another group for this
5. Optional space, because if both Date columns are missing I only have one space between 4 and 7
6. Match 7-9 aggressively
7. Rest goes into last group

If anyone wants to take a crack at this I would be very thankful. The VBA code you need is below. To use the code in Excel, put the test data above in cells A2-A7, put the pattern string in Cell A1, select cells B2-K2, put in the formula "=RegExpGroup(A2;$A$1)", and CTRL-SHIFT-ENTER the formula. Copy down to break up all the strings:

Code:
Function RegExpGroup(FindIn As String, FindWhat As String, Optional bGlobal As Boolean = True) As Variant
    ' Needs reference "Microsoft VBScript Regular Expressions 5.5
    Dim re As RegExp, i As Integer, j As Integer, tmp
    Application.Volatile False
    
    RegExpGroup = Array(CVErr(xlErrNA))
    Set re = New RegExp
    re.Global = bGlobal
    re.Pattern = FindWhat
    If re.test(FindIn) Then
        With re.Execute(FindIn)
            ReDim rslt(0 To .count - 1)
            For i = 0 To .count - 1
                With .Item(i).SubMatches
                    tmp = Array()
                    ReDim tmp(0 To .count - 1)
                    For j = 0 To .count - 1
                        tmp(j) = .Item(j)
                    Next j
                    rslt(i) = tmp
                End With
            Next i
            RegExpGroup = rslt
        End With
    End If
End Function
 
Ok, I'm pissed :banghead:. It turns out my understanding of pgc's solution was not complete and now that I've thought more about it, I no longer understand it.

Xenou's tip about the use of ? to turn on lazy matching was very good though. It turns out I only needed to add ?? after my 5th group to fix my problem.
 
Upvote 0

Excel Facts

Is there a shortcut key for strikethrough?
Ctrl+S is used for Save. Ctrl+5 is used for Strikethrough. Why Ctrl+5? When you use hashmarks to count |||| is 4, strike through to mean 5.
Yes, using the lazy matching is also a good solution for your problem.
The main difference between my pattern and your original pattern is just items 5 and 6, that, although optional, have a specific order of assignment in case some are missing. In my case I used grouping that is easy to expand for more optional columns (although it may result in a forest of parentheses:)).
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,170
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top