Delete rows on large file where column does not contain string

Serafin54

Board Regular
Joined
Apr 11, 2014
Messages
165
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have large reports over 20k rows that I am trying to build a code for. The main part I am stuck on is that i need to delete any row that contains the string "The address is valid and deliverable according to official postal agencies." in column J. Both the codes i am posting do work but hang for a while (understandably) so i am just checking if there is something I can do to speed the process up.

VBA Code:
With ActiveSheet
    .AutoFilterMode = False
    With Range("j1", Range("j" & Rows.Count).End(xlUp))
        .AutoFilter 1, "*The address is valid and deliverable according to official postal agencies.*"
        On Error Resume Next
        .Offset(1).SpecialCells(12).EntireRow.Delete
    End With
    .AutoFilterMode = False
End With

and i also have

Code:
With ActiveSheet.UsedRange
    .AutoFilter Field:=10, Criteria1:="=*The address is valid and deliverable according to official postal agencies.*"
    .Offset(1).SpecialCells(xlCellTypeVisible).EntireRow.Delete
    .AutoFilter
End With

As I said, I am just beginning to build the code but deleting these rows drops the file 10 to 12k rows so i figure this part is best to tackle first. So if there is a best practice solution that is better, I am totally open to it. I figure this is a piece of code i will use frequently on big files so better to start doing it properly. Thanks for the help!
 

Excel Facts

Links? Where??
If Excel says you have links but you can't find them, go to Formulas, Name Manager. Look for old links to dead workbooks & delete.
In your case, with a large amount of rows, I have found it would be quicker to:
1) Copy the data range to an array
2) Delete the data range from the sheet
3) Check the array for the criteria that you want to exclude
4) Copy all rows that don't meet criteria to exclude to another array
5) Copy that array to the data range
 
Upvote 0
Give this a try with a copy of your data.

VBA Code:
Sub Del_Rows()
  Dim a As Variant, b As Variant
  Dim nc As Long, i As Long, k As Long
 
  nc = Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
  a = Range("J2", Range("J" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If InStr(1, a(i, 1), "The address is valid and deliverable according to official postal agencies.", 1) > 0 Then
      b(i, 1) = 1
      k = k + 1
    End If
  Next i
  If k > 0 Then
    Application.ScreenUpdating = False
    With Range("A2").Resize(UBound(a), nc)
      .Columns(nc).Value = b
      .Sort Key1:=.Columns(nc), Order1:=xlAscending, Header:=xlNo
      .Resize(k).EntireRow.Delete
    End With
    Application.ScreenUpdating = True
  End If
End Sub
 
Upvote 0
Solution
Give this a try with a copy of your data.

VBA Code:
Sub Del_Rows()
  Dim a As Variant, b As Variant
  Dim nc As Long, i As Long, k As Long
 
  nc = Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
  a = Range("J2", Range("J" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If InStr(1, a(i, 1), "The address is valid and deliverable according to official postal agencies.", 1) > 0 Then
      b(i, 1) = 1
      k = k + 1
    End If
  Next i
  If k > 0 Then
    Application.ScreenUpdating = False
    With Range("A2").Resize(UBound(a), nc)
      .Columns(nc).Value = b
      .Sort Key1:=.Columns(nc), Order1:=xlAscending, Header:=xlNo
      .Resize(k).EntireRow.Delete
    End With
    Application.ScreenUpdating = True
  End If
End Sub
Thank you! That works perfectly and surprisingly fast. I will now need to try and dissect it so i can see what is doing what lol. Can you tell me why it runs so quickly for my own knowledge?
 
Upvote 0
Thank you! That works perfectly and surprisingly fast.
You're welcome. Thanks for the confirmation. :)

Can you tell me why it runs so quickly for my own knowledge?
Firstly, the reason that yours is slow will be because Excel has to delete a great number of disjoint rows.
For my small example sheet below, where we want to delete any rows containing an "x" in col B, after the data is filtered row 2 would be deleted and all the rows below need to get moved up. Then row 3 (which was originally row 4) has to get deleted and all the rows below that moved up, and so on. All those separate deletions and movements are quite slow.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC2
3A3abcC3
4A4xyzC4
5A5defC5
6A6xxxC6
7A7ghiC7
8A8red foxC8
9A9jklC9
10
Sample


My code starts off by determining the first available blank column to the right of the data, assuming that the heading row can be used for that. Col D in my sample.
It then reads just the data of interest (B2:B9) into an array (a) in memory. Dealing with the values in memory is much faster than code continually accessing the worksheet to retrieve (or place) values.
Make a new blank array (b) the same size as array a.
Work through the values in array a. If an "x" is found ..
- put a 1 in the correspoding position in array b, and
- keep count of the rows found. This has two uses later.

Use the k > 0 test to check that at least 1 row has been found for deletion. After all, if none are found then there is nothing more to do.
Use the range from row 2 to the bottom of the data and including that first blank column at the right (A2:D9 for me)
Into the last column of that range enter the values from array b. At that point my sheet would look like this

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC21
3A3abcC3
4A4xyzC41
5A5defC5
6A6xxxC61
7A7ghiC7
8A8red foxC81
9A9jklC9
Sample


Now sort that range (still A2:D9) by the last column, bringing all the "1" rows together. The other rows retain their original positions in relation to each other.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC21
3A4xyzC41
4A6xxxC61
5A8red foxC81
6A3abcC3
7A5defC5
8A7ghiC7
9A9jklC9
Sample


Still with that same range (A2:D2), resize it to the number of rows = k (4 in this case) and delete those rows. This both gets rid of the required rows and also removes the 'helper' "1" cells.
This is where the major time-saving actually is because there is just one single range to remove and move cells up instead of 4 separate ones.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A3abcC3
3A5defC5
4A7ghiC7
5A9jklC9
Sample


You will also note that I had one cell coloured and bolded. I did that to demonstrate an advantage of this method compared to the one suggested in post #2. With this method, that cell formatting followed its data all the way through the process. That would not happen with the other method.
This code would also retain any existing formulas among the remaining data. Not so with the other method.
 
Upvote 0
You will also note that I had one cell coloured and bolded. I did that to demonstrate an advantage of this method compared to the one suggested in post #2. With this method, that cell formatting followed its data all the way through the process. That would not happen with the other method.
This code would also retain any existing formulas among the remaining data. Not so with the other method.

As Austin Powers said, "Ouch baby, very ouch."

:) Very good points, once again @Peter_SSs
 
Upvote 0
You're welcome. Thanks for the confirmation. :)


Firstly, the reason that yours is slow will be because Excel has to delete a great number of disjoint rows.
For my small example sheet below, where we want to delete any rows containing an "x" in col B, after the data is filtered row 2 would be deleted and all the rows below need to get moved up. Then row 3 (which was originally row 4) has to get deleted and all the rows below that moved up, and so on. All those separate deletions and movements are quite slow.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC2
3A3abcC3
4A4xyzC4
5A5defC5
6A6xxxC6
7A7ghiC7
8A8red foxC8
9A9jklC9
10
Sample


My code starts off by determining the first available blank column to the right of the data, assuming that the heading row can be used for that. Col D in my sample.
It then reads just the data of interest (B2:B9) into an array (a) in memory. Dealing with the values in memory is much faster than code continually accessing the worksheet to retrieve (or place) values.
Make a new blank array (b) the same size as array a.
Work through the values in array a. If an "x" is found ..
- put a 1 in the correspoding position in array b, and
- keep count of the rows found. This has two uses later.

Use the k > 0 test to check that at least 1 row has been found for deletion. After all, if none are found then there is nothing more to do.
Use the range from row 2 to the bottom of the data and including that first blank column at the right (A2:D9 for me)
Into the last column of that range enter the values from array b. At that point my sheet would look like this

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC21
3A3abcC3
4A4xyzC41
5A5defC5
6A6xxxC61
7A7ghiC7
8A8red foxC81
9A9jklC9
Sample


Now sort that range (still A2:D9) by the last column, bringing all the "1" rows together. The other rows retain their original positions in relation to each other.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A2xC21
3A4xyzC41
4A6xxxC61
5A8red foxC81
6A3abcC3
7A5defC5
8A7ghiC7
9A9jklC9
Sample


Still with that same range (A2:D2), resize it to the number of rows = k (4 in this case) and delete those rows. This both gets rid of the required rows and also removes the 'helper' "1" cells.
This is where the major time-saving actually is because there is just one single range to remove and move cells up instead of 4 separate ones.

Serafin54.xlsm
ABCD
1Hdr1Hdr2Hdr3
2A3abcC3
3A5defC5
4A7ghiC7
5A9jklC9
Sample


You will also note that I had one cell coloured and bolded. I did that to demonstrate an advantage of this method compared to the one suggested in post #2. With this method, that cell formatting followed its data all the way through the process. That would not happen with the other method.
This code would also retain any existing formulas among the remaining data. Not so with the other method.

Very interesting and good stuff. Thank you. Though the coding is over my head, that logic makes sense and I can see the reasoning for the increased speed. I appreciate you as always Peter.
 
Last edited by a moderator:
Upvote 0

Forum statistics

Threads
1,223,909
Messages
6,175,314
Members
452,634
Latest member
cpostell

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top