# Delete rows on large file where column does not contain string



## Serafin54 (Dec 22, 2022)

Hello,

I have large reports over 20k rows that I am trying to build a code for. The main part I am stuck on is that i need to delete any row that contains the string "The address is valid and deliverable according to official postal agencies." in column J. Both the codes i am posting do work but hang for a while (understandably) so i am just checking if there is something I can do to speed the process up.


```
With ActiveSheet
    .AutoFilterMode = False
    With Range("j1", Range("j" & Rows.Count).End(xlUp))
        .AutoFilter 1, "*The address is valid and deliverable according to official postal agencies.*"
        On Error Resume Next
        .Offset(1).SpecialCells(12).EntireRow.Delete
    End With
    .AutoFilterMode = False
End With
```

and i also have 


```
With ActiveSheet.UsedRange
    .AutoFilter Field:=10, Criteria1:="=*The address is valid and deliverable according to official postal agencies.*"
    .Offset(1).SpecialCells(xlCellTypeVisible).EntireRow.Delete
    .AutoFilter
End With
```

As I said, I am just beginning to build the code but deleting these rows drops the file 10 to 12k rows so i figure this part is best to tackle first. So if there is a best practice solution that is better, I am totally open to it. I figure this is a piece of code i will use frequently on big files so better to start doing it properly. Thanks for the help!


----------



## johnnyL (Dec 22, 2022)

In your case, with a large amount of rows, I have found it would be quicker to:
1) Copy the data range to an array
2) Delete the data range from the sheet
3) Check the array for the criteria that you want to exclude
4) Copy all rows that don't meet criteria to exclude to another array
5) Copy that array to the data range


----------



## Peter_SSs (Dec 23, 2022)

Give this a try with a copy of your data.


```
Sub Del_Rows()
  Dim a As Variant, b As Variant
  Dim nc As Long, i As Long, k As Long
 
  nc = Cells.Find(What:="*", LookIn:=xlFormulas, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column + 1
  a = Range("J2", Range("J" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If InStr(1, a(i, 1), "The address is valid and deliverable according to official postal agencies.", 1) > 0 Then
      b(i, 1) = 1
      k = k + 1
    End If
  Next i
  If k > 0 Then
    Application.ScreenUpdating = False
    With Range("A2").Resize(UBound(a), nc)
      .Columns(nc).Value = b
      .Sort Key1:=.Columns(nc), Order1:=xlAscending, Header:=xlNo
      .Resize(k).EntireRow.Delete
    End With
    Application.ScreenUpdating = True
  End If
End Sub
```


----------



## Serafin54 (Dec 23, 2022)

Peter_SSs said:


> Give this a try with a copy of your data.
> 
> 
> ```
> ...


Thank you!  That works perfectly and surprisingly fast.  I will now need to try and dissect it so i can see what is doing what lol.  Can you tell me why it runs so quickly for my own knowledge?


----------



## Peter_SSs (Dec 23, 2022)

Serafin54 said:


> Thank you! That works perfectly and surprisingly fast.


You're welcome. Thanks for the confirmation. 



Serafin54 said:


> Can you tell me why it runs so quickly for my own knowledge?


Firstly, the reason that yours is slow will be because Excel has to delete a great number of disjoint rows.
For my small example sheet below, where we want to delete any rows containing an "x" in col B, after the data is filtered row 2 would be deleted and all the rows below need to get moved up. Then row 3 (which was originally row 4) has to get deleted and all the rows below that moved up, and so on. All those separate deletions and movements are quite slow.

Serafin54.xlsmABCD1Hdr1Hdr2Hdr32A2xC23A3abcC34A4xyzC45A5defC56A6xxxC67A7ghiC78A8red foxC89A9jklC910Sample

My code starts off by determining the first available blank column to the right of the data, assuming that the heading row can be used for that. Col D in my sample.
It then reads just the data of interest (B2:B9) into an array (a) in memory. Dealing with the values in memory is *much *faster than code continually accessing the worksheet to retrieve (or place) values.
Make a new blank array (b) the same size as array a.
Work through the values in array a. If an "x" is found ..
- put a 1 in the correspoding position in array b, and
- keep count of the rows found. This has two uses later.

Use the k > 0  test to check that at least 1 row has been found for deletion. After all, if none are found then there is nothing more to do.
Use the range from row 2 to the bottom of the data and including that first blank column at the right (A2:D9 for me)
Into the last column of that range enter the values from array b. At that point my sheet would look like this

Serafin54.xlsmABCD1Hdr1Hdr2Hdr32A2xC213A3abcC34A4xyzC415A5defC56A6xxxC617A7ghiC78A8red foxC819A9jklC9Sample

Now sort that range (still A2:D9) by the last column, bringing all the "1" rows together. The other rows retain their original positions in relation to each other.

Serafin54.xlsmABCD1Hdr1Hdr2Hdr32A2xC213A4xyzC414A6xxxC615A8red foxC816A3abcC37A5defC58A7ghiC79A9jklC9Sample

Still with that same range (A2:D2), resize it to the number of rows = k (4 in this case) and delete those rows. This both gets rid of the required rows and also removes the 'helper' "1" cells.
This is where the major time-saving actually is because there is just one single range to remove and move cells up instead of 4 separate ones.

Serafin54.xlsmABCD1Hdr1Hdr2Hdr32A3abcC33A5defC54A7ghiC75A9jklC9Sample

You will also note that I had one cell coloured and bolded. I did that to demonstrate an advantage of this method compared to the one suggested in post #2. With this method, that cell formatting followed its data all the way through the process. That would not happen with the other method.
This code would also retain any existing formulas among the remaining data. Not so with the other method.


----------



## johnnyL (Dec 23, 2022)

Peter_SSs said:


> You will also note that I had one cell coloured and bolded. I did that to demonstrate an advantage of this method compared to the one suggested in post #2. With this method, that cell formatting followed its data all the way through the process. That would not happen with the other method.
> This code would also retain any existing formulas among the remaining data. Not so with the other method.



As Austin Powers said, "Ouch baby, very ouch."

 Very good points, once again @Peter_SSs


----------



## Serafin54 (Dec 27, 2022)

Peter_SSs said:


> You're welcome. Thanks for the confirmation.
> 
> 
> Firstly, the reason that yours is slow will be because Excel has to delete a great number of disjoint rows.
> ...



Very interesting and good stuff. Thank you.  Though the coding is over my head, that logic makes sense and I can see the reasoning for the increased speed.  I appreciate you as always Peter.


----------



## Peter_SSs (Dec 27, 2022)

Cheers.


----------

