RegEx on a large range

gvg

New Member
Joined
Mar 23, 2013
Messages
8
Hi,

I have been experimenting with RegEx. Most of the examples on the net involve looping through the whole range in which you want to make changes with RegEx. On large files it takes ages to do it. Is there a faster way? I.e. maybe applying RexEx to the whole range instead of looping through every cell in the range?
 

Excel Facts

Test for Multiple Conditions in IF?
Use AND(test, test, test, test) or OR(test, test, test, ...) as the logical_test argument of IF.
I have been using RegEx for years in both Excel and Word and I know of no way to do what you are describing. I hesitate to say it's impossible, but I have never found a way to do it.

RegEx is an external general-purpose library that takes an expression and does pattern-matching on it. Because it's general-purpose it is not integrated into the Excel object model and has no semantics for ranges. That is, it doesn't know anything about ranges, it only knows about strings. The only way to use it is to give it a string. There is no way to make changes through a range without evaluating each cell individually as a string.

However, there are differences in efficiency for looping through a range. It would be interesting to see your code. For example, for what you are doing it would probably be fastest to load the worksheet range into an array, loop through the array to make the changes, the dump the array back to the worksheet.
 
Upvote 0
Thanks! Here's my code:

VBA Code:
Sub mdpReplaceCharacters()
    Dim strPattern As String: strPattern = "[^a-zA-Z0-9 /]"
    Dim strReplace As String: strReplace = ""
    Dim FileToOpen As Variant
    Set regEx = CreateObject("VBScript.RegExp")
   
    FileToOpen = Application.GetOpenFilename
    If FileToOpen <> False Then
        Set OpenBook = Application.Workbooks.Open(FileToOpen)
    End If
   
    Application.ScreenUpdating = False
   
    Range("B2").CurrentRegion.Select
       
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
       
        For Each c In Selection
            c.Value = regEx.Replace(c.Value, strReplace)
        Next
       
     Range("B2").Select
     Application.ScreenUpdating = True

End Sub

Thing is I am loopping through the range 750 000 rows, 25 columns. Takes around 20 minutes to complete the task.
 
Upvote 0
What are you actually replacing ?
Some samples would help.
 
Upvote 0
No, you can't do it on the whole range at once, but yes you can speed it up considerably.

Instead of looping through the cells on the worksheet, read all the values into an array in memory, process them there then write all the results back to the sheet at once - exactly as described by 6StringJazzer.

For testing I have removed any opening workbooks and assumed data is on the active sheet.
For about 1,000 rows and 8 columns your code took 0.477 seconds and the code below took 0.043 seconds. So about 10 times faster.
You probably also do not need the ScreenUpdating lines for this code but I have left them in.

VBA Code:
Sub mdpReplaceCharactersTest()
    Dim regEx As Object
    Dim a As Variant
    Dim strPattern As String: strPattern = "[^a-zA-Z0-9 /]"
    Dim strReplace As String: strReplace = ""
    Dim r As Long, c As Long, uba2 As Long

    Set regEx = CreateObject("VBScript.RegExp")

    Application.ScreenUpdating = False

    a = Range("B2").CurrentRegion.Value
    uba2 = UBound(a, 2)

    With regEx
      .Global = True
      .MultiLine = True
      .IgnoreCase = False
      .Pattern = strPattern
    End With

    For r = 1 To UBound(a)
      For c = 1 To uba2
        a(r, c) = regEx.Replace(a(r, c), strReplace)
      Next c
    Next r
   
    Range("B2").CurrentRegion.Value = a
    Application.ScreenUpdating = True
End Sub
 
Upvote 0
Solution
@Peter_SSs
Thanks for the update. I'd love to have a better grasp on Regex, but the likes of yourself and @Rick Rothstein always seem to provide a better solution.. (y) (y)
 
Upvote 0
My educated guess is that you are removing illegal characters from file paths (if so, it will also remove some legal characters, but that's OK).
 
Upvote 0
@Peter_SSs Thanks ! This is way faster.
You're welcome. Thanks for the follow-up. (y)

I did end up testing on up to about 100,000 rows & 25 columns and the speed was more like 15 times faster for my sample data. :)
 
Upvote 0

Forum statistics

Threads
1,225,757
Messages
6,186,848
Members
453,379
Latest member
gabriellegonzalez

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top