Use RegEx to creat in cell fuctions that find substrings and modify them in the string.

dagda13

Board Regular
Joined
May 18, 2019
Messages
52
Hi,

So I have a list if alphanumeric IDs that I'd like to modify using RegEx to create an in-cell function. I've already enabled Microsoft VBScript Regular Expression 5.5, and I'm using the below code:

VBA Code:
Function simpleCellRegex(Myrange As Range) As String
    Set regEx = CreateObject("VBScript.RegExp")
    Dim strPattern As String
    Dim strPattern2 As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    strPattern = "_[a-zA-Z][0-9]"
        
       If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = "_"

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
            
        End With

        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

I have a list of IDs that use "_" as a delimiter:

lchr_01_b82_lc_enus
lchr_01_b50_lc_enus
lchr_01_a50_lc_enus
lchr_01_b01_lc_enus

Within each ID is a substring that has an alpha and a number between delimiters, ie., "_b82"_, "_b50_", "_a50_", "b01_".

Right now the above code will find the alphanumberic substrings (the string pattern " strPattern = "_[a-zA-Z][0-9]"), but I'd like to just delete the alpha and not anything else. I'm not sure how to make the strReplace value dynamic since the IDs are also dynamic. I'd also like to get rid of the the rest of the text after the last delimter. The desired output in the above example would be:

lchr_01_b82_lc_enuslchr_01_82_lc
lchr_01_b50_lc_enuslchr_01_50_lc
lchr_01_a50_lc_enuslchr_01_50_lc
lchr_01_b01_lc_enuslchr_01_01_lc

Is using RegEx the right way to do this, and if so, is the above the right way to go about this? Thanks in advance!


 
Thanks Rick,

So the first code worked perfectly. I guess I was wondering if there could be a conditional in the case of particular exceptions in the ID strings. What I'm after is a conditional that kicks in (ie., that will remove the last 2 sub-strings in the "_" delimiter) if it follows "_a01_01_"..._[a-z][0-9][0-9]_[0-9][0-9]". The strings (IDs) that have these substrings will always have this pattern. Otherwise, the output would be like the original code (which worked!) I've bolded the unique sub-string pattern below (the first three IDs...the rest would be handled just the same way as the original code):

Sorry, my bad, that I didn't provide a better sampling before.

IDOutput from VBADesired output
ds_swaywapp_a01_10_enusds_swaywapp_01ds_swaywapp_01
ds_swaywapp_b02_09_enusds_swaywapp_02ds_swaywapp_02
ds_swaywapp_c03_09_enusds_swaywapp_03ds_swaywapp_03
ds_swayw10_a01_enusds_swayw10_01ds_swayw10_01
ds_swayw10_b02_enusds_swayw10_02ds_swayw10_02
ds_swayw10_c03_enusds_swayw10_03ds_swayw10_03
ds_swaywapp_04_enusds_swaywapp_04_enuds_swaywapp_04
mo_bgwo_b01_dt_enusmo_bgwo_01mo_bgwo_01_dt
mo_bgwo_b02_dt_enusmo_bgwo_02mo_bgwo_02_dt
mo_bgwo_b03_dt_enusmo_bgwo_03mo_bgwo_03_dt
mo_bgwo_b04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_c04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_b05_dt_enusmo_bgwo_05mo_bgwo_05_dt
ds_msw2013_01_enusds_msw2013_01_enuds_msw2013_01
ds_msw2013_02_enusds_msw2013_02_enuds_msw2013_02

Thanks again for all your help, it is appreciated!
 
Upvote 0

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
I was wondering if there could be a conditional in the case of particular exceptions in the ID strings.
Possibly, but your latest examples bear little resemblance to the original RegEx pattern in your code and your written description ..
Within each ID is a substring that has an alpha and a number between delimiters, ie., "_b82"_, "_b50_", "_a50_", "b01_".
.. and all your samples in posts 1, 3, 8 & 9

Now we have several examples that do not have an alpha and a number between delimiters
Although my original function appears to produce the result that you want for those examples, that was pure luck not design of the code.

Assuming the examples without a letter before the 2 digits (eg last example in post 11) are correct, is it now also possible to have an original string like this?
ds_swaywapp_01_10_enus
 
Upvote 0
Possibly, but your latest examples bear little resemblance to the original RegEx pattern in your code and your written description ..
.. and all your samples in posts 1, 3, 8 & 9

Now we have several examples that do not have an alpha and a number between delimiters
Although my original function appears to produce the result that you want for those examples, that was pure luck not design of the code.

Assuming the examples without a letter before the 2 digits (eg last example in post 11) are correct, is it now also possible to have an original string like this?
ds_swaywapp_01_10_enus
Hi Peter,

Truly sorry, 3 of the IDs in my last post were already fixed. It would always be "ds_swaywapp_a01_10_enus" or "ds_swaywapp_a01_enus". My bad.
There will always be an alpha and a number between delimiters.

IDOutput from VBADesired output
ds_swaywapp_a01_10_enusds_swaywapp_01ds_swaywapp_01
ds_swaywapp_b02_09_enusds_swaywapp_02ds_swaywapp_02
ds_swaywapp_c03_09_enusds_swaywapp_03ds_swaywapp_03
ds_swayw10_a01_enusds_swayw10_01ds_swayw10_01
ds_swayw10_b02_enusds_swayw10_02ds_swayw10_02
ds_swayw10_c03_enusds_swayw10_03ds_swayw10_03
mo_bgwo_b01_dt_enusmo_bgwo_01mo_bgwo_01_dt
mo_bgwo_b02_dt_enusmo_bgwo_02mo_bgwo_02_dt
mo_bgwo_b03_dt_enusmo_bgwo_03mo_bgwo_03_dt
mo_bgwo_b04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_c04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_b05_dt_enusmo_bgwo_05mo_bgwo_05_dt

I'm sorry again if I've wasted your time.
 
Last edited:
Upvote 0
The letter that we are removing... is it always followed by exactly two digits and an underline character? If so, is it always after that second digit that you want the text deleted (so that the resulting text always ends with the two-digit number after the underline character followed by a single letter? If so, give this function a try...
VBA Code:
Function DelAlpha(S As String) As String
  Dim X As Long
  For X = 1 To Len(S)
    If Mid(S, X, 3) Like "_[A-Za-z]#" Then
      S = Left(S, X) & Mid(S, X + 2, 2)
      Exit For
    End If
  Next
  DelAlpha = Left(S, InStrRev(S, "_") + 3)
End Function
Hi Rick,

As Peter has pointed out, there were some non-alphanumeric sub-strings in my previous list. Totally my fault, these 3 IDs were already fixed. The correct sample would be:

IDOutput from VBADesired output
ds_swaywapp_a01_10_enusds_swaywapp_01ds_swaywapp_01
ds_swaywapp_b02_09_enusds_swaywapp_02ds_swaywapp_02
ds_swaywapp_c03_09_enusds_swaywapp_03ds_swaywapp_03
ds_swayw10_a01_enusds_swayw10_01ds_swayw10_01
ds_swayw10_b02_enusds_swayw10_02ds_swayw10_02
ds_swayw10_c03_enusds_swayw10_03ds_swayw10_03
mo_bgwo_b01_dt_enusmo_bgwo_01mo_bgwo_01_dt
mo_bgwo_b02_dt_enusmo_bgwo_02mo_bgwo_02_dt
mo_bgwo_b03_dt_enusmo_bgwo_03mo_bgwo_03_dt
mo_bgwo_b04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_c04_dt_enusmo_bgwo_04mo_bgwo_04_dt
mo_bgwo_b05_dt_enusmo_bgwo_05mo_bgwo_05_dt

Truly sorry again for the bad data previously, and I sorry if I've wasted your time.
 
Upvote 0
It would always be "ds_swaywapp_a01_10_enus" or "ds_swaywapp_a01_enus".
Thanks. Important to have accurate examples. ;)

Try this version

VBA Code:
Function DelAlpha(s As String) As String
  Dim RX As Object

  Set RX = CreateObject("VBScript.Regexp")
  RX.Pattern = "(.*_)([a-zA-Z])(\d\d)(_\d\d)?(_[a-zA-z]+)?(_[^_]+$)"
  DelAlpha = RX.Replace(s, "$1$3$5")
End Function

dagda13 2020-03-24 1.xlsm
ABCD
1IDOutputDesired outputCheck Output
2ds_swaywapp_a01_10_enusds_swaywapp_01ds_swaywapp_01TRUE
3ds_swaywapp_b02_09_enusds_swaywapp_02ds_swaywapp_02TRUE
4ds_swaywapp_c03_09_enusds_swaywapp_03ds_swaywapp_03TRUE
5ds_swayw10_a01_enusds_swayw10_01ds_swayw10_01TRUE
6ds_swayw10_b02_enusds_swayw10_02ds_swayw10_02TRUE
7ds_swayw10_c03_enusds_swayw10_03ds_swayw10_03TRUE
8mo_bgwo_b01_dt_enusmo_bgwo_01_dtmo_bgwo_01_dtTRUE
9mo_bgwo_b02_dt_enusmo_bgwo_02_dtmo_bgwo_02_dtTRUE
10mo_bgwo_b03_dt_enusmo_bgwo_03_dtmo_bgwo_03_dtTRUE
11mo_bgwo_b04_dt_enusmo_bgwo_04_dtmo_bgwo_04_dtTRUE
12mo_bgwo_c04_dt_enusmo_bgwo_04_dtmo_bgwo_04_dtTRUE
13mo_bgwo_b05_dt_enusmo_bgwo_05_dtmo_bgwo_05_dtTRUE
Sheet2
Cell Formulas
RangeFormula
B2:B13B2=DelAlpha(A2)
D2:D13D2=B2=C2
 
Upvote 0
Thanks. Important to have accurate examples. ;)

Try this version

VBA Code:
Function DelAlpha(s As String) As String
  Dim RX As Object

  Set RX = CreateObject("VBScript.Regexp")
  RX.Pattern = "(.*_)([a-zA-Z])(\d\d)(_\d\d)?(_[a-zA-z]+)?(_[^_]+$)"
  DelAlpha = RX.Replace(s, "$1$3$5")
End Function

dagda13 2020-03-24 1.xlsm
ABCD
1IDOutputDesired outputCheck Output
2ds_swaywapp_a01_10_enusds_swaywapp_01ds_swaywapp_01TRUE
3ds_swaywapp_b02_09_enusds_swaywapp_02ds_swaywapp_02TRUE
4ds_swaywapp_c03_09_enusds_swaywapp_03ds_swaywapp_03TRUE
5ds_swayw10_a01_enusds_swayw10_01ds_swayw10_01TRUE
6ds_swayw10_b02_enusds_swayw10_02ds_swayw10_02TRUE
7ds_swayw10_c03_enusds_swayw10_03ds_swayw10_03TRUE
8mo_bgwo_b01_dt_enusmo_bgwo_01_dtmo_bgwo_01_dtTRUE
9mo_bgwo_b02_dt_enusmo_bgwo_02_dtmo_bgwo_02_dtTRUE
10mo_bgwo_b03_dt_enusmo_bgwo_03_dtmo_bgwo_03_dtTRUE
11mo_bgwo_b04_dt_enusmo_bgwo_04_dtmo_bgwo_04_dtTRUE
12mo_bgwo_c04_dt_enusmo_bgwo_04_dtmo_bgwo_04_dtTRUE
13mo_bgwo_b05_dt_enusmo_bgwo_05_dtmo_bgwo_05_dtTRUE
Sheet2
Cell Formulas
RangeFormula
B2:B13B2=DelAlpha(A2)
D2:D13D2=B2=C2
This is perfect and exactly what I need, Peter. Thanks so much for all your help and patience!
 
Upvote 0

Forum statistics

Threads
1,224,818
Messages
6,181,152
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top