Pattern extraction UDF -- REGEX

ChrisOswald

Active Member
Joined
Jan 19, 2010
Messages
454
Hi,

I'm needing to extract from an input string a substring that fits a certain pattern. The pattern match I'm looking for is:

1) The substring is 5 characters long
2) The characters around the substring need to be non-alphanumeric (to bias the pattern match fails to false negatives)
3) The pattern is [1-2][0-9A-z][0-9A-z][0-9A-z][0-9A-z]
4) The string should be search from "Back to Front", and if more than 1 possibility is found the hindmost should be returned.
5) There is never more than 2 characters in the substring that are letters. (this is the bit I can't figure out without testing all 11 possible patterns, which is slightly less than elegant)
6) If no pattern match is found, the UDF returns ""

Thanks,
 
@Xenou. No, stripping out the log entry down to a single records isn't an option. I can't explain the detail as its sensitive, sometimes personal and confidential, The find works perfectly, the identify the time stamp preceding it works perfectly.

i only showed three fake log entries, there are many types depending on what system is applying data to the log, many are automated, these ones are where a controller puts the data in manually.

As its manual it will be time stamped yet will not occur within the 100th of a second of the actual request, so the controller will be writing a time hours and minutes, its that i actually need to see, but would be unachievable to find in the log.

logs can be pages long, or can be a single entry (I accept that sometimes i will not find the data

Many of the logs will not hold the key CRM, its to do with a specific audit

The final key is knowing where that first time stamp that relates is, back up the string to pick up the keyword entry, though i could just create that text (can't think of a scenario where the text would change, but is possible),

As I don't know where CRM will occur, the fifty words should capture it,

It seems to me that excel knows where the start word is after pinpointing it, so what i need is to pull from the text a string that needs no manipulation
 
Upvote 0

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
@JS411

That does the job. I actually came up with a similar concept over the weekend. I guess I overfocused on how to do this with RegExes while at work.

@pgc01
That is precisely what I wanted. Now all I have to do is figure out what the pattern modifiers mean and do - which shouldn't be too hard now that I have something to Google at.
One clarification please, if you would. Does using
Code:
With CreateObject("VBScript.RegExp")
keep the regex engine/object/thingy open/instantiated/"In the house" while calculating a bunch of these via copy/paste? I think I saw a thread a month or two ago where one of the board MVP's stated that that is the case.


Thanks to both of you.
 
Upvote 0
Glad if this helped at all; however pgc01's solution is clearly the way to go.

I had no idea there was a VBScript.RegExp object. Thanks pgc01 for introducing me to that.
 
Upvote 0
I'm glad it helped.

One clarification please, if you would. Does using
Code:
With CreateObject("VBScript.RegExp")
keep the regex engine/object/thingy open/instantiated/"In the house" while calculating a bunch of these via copy/paste?

No.

The code I posted will not be efficient if you are using many formulas with this udf, because it will create a RegExp object each time it's called.

For that case, use this code, similar to the previous, but where the RegExp object is only created the first time the udf is called.

Code:
Function GetStr(s As String) As String
Static rx As Object
Dim rxMatches As Object
 
If rx Is Nothing Then
    Set rx = CreateObject("VBScript.RegExp")
    With rx
        .Pattern = "\b(?!(\d*[A-Z]){3})[1-2][0-9A-Z][0-9A-Z][0-9A-z][0-9A-Z]\b"
        .Global = True
        .IgnoreCase = True
    End With
End If
 
With rx
    Set rxMatches = .Execute(s)
    If rxMatches.Count > 0 Then GetStr = rxMatches(rxMatches.Count - 1).Value
End With
End Function
 
Upvote 0
I had no idea there was a VBScript.RegExp object. Thanks pgc01 for introducing me to that.

Hi Jerry

Yes, it's great that we can also used regular expressions in vba. If you are already familiar with them from other languages you won't see much difference here. The only minus point is that ms has not yet implemented the LookBehind feature.
 
Upvote 0
Thanks pgc,

That's a really cool way to do this. The function you provided is going to see a fair bit of use: it's a post-fix for a bunch of SQLServer based reports. At the risk of completely derailing this thread, one more Yes/No question: Would this same concept work with an Access UDF querying an ODBC source?
 
Upvote 0

Forum statistics

Threads
1,225,155
Messages
6,183,212
Members
453,151
Latest member
Lizamaison

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top