Faster Way to Delete Rows?

PoggiPJ

Active Member
Joined
Mar 25, 2008
Messages
330
My spreadsheet is a contract file that includes a list of part numbers being sold. Each part number refers to a lookup table of ALL about 20,000 valid part numbers and prices. Once the contract is finalized, I want to make the contract sheet smaller by deleting all rows in the lookup table that are not required for this contract.

The following code works, but it takes more than 10 minutes to work through the complete list. I read down the lookup table. For each record in the lookup table, I call a routine that reads through an array of the part numbers that are included in this contract. If the lookup table part number IS included in the contract, I skip it. If it is NOT required, I delete it. I then return to the main lookup table and read in the next lookup table record.

This is the main routine where I progress down the big lookup table.
Code:
'Work down the Price File range from top to bottom
    Set RefTableRange = DSWPrices.Range("DSWPriceRange")
    RefTableIndex = 1
    Application.Calculation = xlCalculationManual
    While RefTableIndex < RefTableRange.Rows.Count
        RefTableIndex = RefTableIndex + 1
        'check if this part number is included in the contract
        Call CheckRefTableRow(RefTableRange, RefTableIndex)
    Wend
This is the routine that checks to see if the part is included in the contract.
Code:
Private Sub CheckRefTableRow(ByRef RefTableRange As Range, ByRef RefTableRow As Long)
Dim ThisPartIsNotInTheContract As Boolean
Dim x As Long
    
    'assumption that this part will NOT be in the contract and will delete it.
    ThisPartIsNotInTheContract = True
    While ThisPartIsNotInTheContract
        For x = 1 To maxPAParts
            If RefTableRange.Cells(RefTableRow, 1) = PAPartArray(x) Then
                'the part actually IS in the contract
                ThisPartIsNotInTheContract = False
                Exit For
            End If
        Next x
        If ThisPartIsNotInTheContract Then
            'Since this part isn't included in the contract, delete it.
            RefTableRange.Cells(RefTableRow, 1).EntireRow.Delete
            deletedRecordCount = deletedRecordCount + 1
        End If
    Wend
End Sub
I'm wondering if there is a better approach - or if there is a way to select the individual rows to be deleted, and then delete them all at once. Any ideas?
 
Re: Solved

How does from the original 12 minutes, down to about 50 seconds sound?:)
Thanks again
Peter, it’s the really good progress!
And it’s much respected that you’ve done it by yourself, just with some suggestions.

But I think that it is possible to speed it up with not more than 5 seconds.
It’s an interesting task and if you wish it just let me know.
For this case I've sent you by PM my e-mail address.
Only testing data and formulas are required similar to your real data/formulas.

Regards,
Vladimir
 
Last edited:
Upvote 0

Excel Facts

Can you sort left to right?
To sort left-to-right, use the Sort dialog box. Click Options. Choose "Sort left to right"
Peter,

To me, about 50 secs for row deletions doesn't seem very fast.

But if you're happy with that, really great?



Vladimir,

I'd be curious to see how fast your approach actually is.
I agree Boller's data didn't very adequately reflect the nature of the problem.
Thus, a bit more, suitably randomized test data from the following code, which may be closer to your own views on the original problem.
The code gives a table in Sheet1 and some lookup data in columnA Sheet2.
Is the essence of the problem that rows in Sheet1 be deleted if entries in Sheet1 column1 match (or don't match as the case may be?) entries in Sheet2, Column1?
Could you show us how fast your approach actually is with these data, or alternatively supply your own data, rather than doing it via PM?
Code:
Sub testdata()  'Table on Sheet1, Lookup data on Sheet2
Dim n, m, a, b, i, j, digz
n = 20000
m = 40000
ReDim a(1 To n, 1 To 1)
ReDim b(1 To m, 1 To 12)
digz = 10 ^ 4
For i = 1 To n
    a(i, 1) = Int(Rnd * 2 * (m + n)) + 1 + digz
Next i
For i = 1 To m: For j = 1 To 12
    If j = 1 Then
        b(i, j) = 2 * Int(Rnd * (m + n)) + 2 + digz
    Else
        b(i, j) = "Xa"
    End If
Next j, i
With Sheets("Sheet1").Cells
    .Clear
    .Resize(m, 12) = b
End With
With Sheets("Sheet2").Cells
    .Clear
    .Resize(n, 1) = a
End With
MsgBox Format(m, "#,##0") & " rows and 12 cols sheet1" & _
        Chr(10) & Format(n, "#,##0") & " values in col A sheet2"
End Sub
 
Upvote 0
Vladimir,
I'd be curious to see how fast your approach actually is.
...

Hi Rugila,

In testing data of your post #32 more than 33000 rows have to be deleted.
It’s clear that the choosing method depends from the expected processing data.

The result of the time test on my PC is approx 0.5 seconds for the code shown below:

Main subroutine:
Rich (BB code):

Sub CompressTable1(Table As Range, DicArray())
  Dim Arr(), x, key$, r&, i&
  Const NUL As Byte = 0
  With CreateObject("Scripting.Dictionary")
    ' Create dictionary from DicArray()
    .CompareMode = 1
    For Each x In DicArray
      key = Trim(x)
      If Len(key) > 0 Then .Item(key) = NUL
    Next
    ' Clean values in Arr() which are not present in the dictionary
    Arr() = Table.Columns(1).Value
    For r = 1 To UBound(Arr)
      key = Trim(Arr(r, 1))
      If Len(key) > 0 Then
        If Not .Exists(key) Then
          Arr(r, 1) = Empty
          i = i + 1
        End If
      End If
    Next
  End With
  ' If it's nothing to do then exit
  Debug.Print "There are " & i & " rows to be deleted"
  If i = 0 Then Exit Sub
  ' Copy back Arr() values to the Table, Sort & delete rows with Empty cells
  With Application
    .EnableEvents = False
    .ScreenUpdating = False
    x = .Calculation: .Calculation = xlCalculationManual
    With Table
      .Columns(1).Value = Arr()
      .Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlNo
      i = .Rows.Count - 1
      With .Cells(1, 1)
        If Len(.Value) > 0 Then
          Range(.End(xlDown).Offset(1), .Offset(i)).EntireRow.Delete
        End If
      End With
    End With
    .EnableEvents = True
    .ScreenUpdating = True
    .Calculation = x
  End With
End Sub

Calling subroutine:
Rich (BB code):

Sub Test()
  
  Dim Table As Range, PAPartArray(), t#
  
  t = Timer
  
  ' The range of table to be compressed
  Set Table = Sheets(1).Range("A1").CurrentRegion
  
  ' Tune next line by your sheet & range, or populate PAPartArray() as required
  PAPartArray() = Sheets(2).Range("A1").CurrentRegion.Columns(1).Value
  
  ' Delete Table rows not matched in PAPartArray()
  CompressTable1 Table, PAPartArray()
  
  t = Timer - t
  Debug.Print Format(t, "0.000") & " secs"
  
End Sub

Regards,
Vladimir
 
Last edited:
Upvote 0
Hi Boller,

You have the sense, but the task is little different – have a look on the 1st post. Approx 20000 of LOOKUP operations are required with searching for each its value in the table of (may be) 10000 items, it’s not so simple & fast as IF functionality you've shown.

The purpose of my post was to illustrate the method.

I doubt that using a LOOKUP formula would take much longer than the IF formula in my illustration, but I think your suggestion of using "Dictionary" is probably better.

However, the point I was trying to put across was that the deletion bit when there are a lot of non-contiguous areas (whether established via a Filter or the Union method) is slower than the method I've suggested, but you seem to have addressed that in your latest posted code.

Perhaps someone would like to test my suggested method with the OP's actual requirements.
Also, what about using my test data to compare the various suggested methods?



"In other sheets there can be also a lot of dependent slow formulas referenced to the rows which have to be deleted. It is also unknown whether the additional columns can be allowed, etc."

The point re slow formulas applies to all the suggested methods, doesn't it?

Re "additional columns", I doubt that the use of a temporary helper column would be unacceptable.

To what does the "etc." refer?
 
Upvote 0
The purpose of my post was to illustrate the method.
...
Boller, thank you for your input. Certainly, I’m not really so categorical, as it may be looks in some of my posts statements ;)

I haven’t real data as you are, but you can try to test VLOOKUP on Rugila test data, it will take years for me. Ok, may be the minutes. The initial code of Peter provides approx the same as VLOOKUP and it coasts more than 10 minutes of PC duty. Using the dictionary in many cases is smart enough, but not in each cases. For example, adding the new elements to more than 100000 items of the dictionary becomes slower and slower and another methods are preferable.

Agree that dependent slow formulas are the common problem. I meant that deleting of rows part by part can be much slower than deletion of all rows at once, but not always.

Using of full temporary column is impossible, for example, if one of the row (say the 1st one) is fully populated by the data. Inserting of column is impossible if even one cell of the rightmost column has the value. It’s rarely happens but can be.

Saying "etc" I mean the expected amount of the rows which would be deleted, the limitation of 8192 areas count for SpecialCells Method, the possibility to sort data for speeding up, etc … again that "etc" :)

Regards,
Vladimir

P.S. I’m feeling inconveniently as we have used the completed thread for additional discussion. Peter, I hope that you forgive us for that :)
 
Last edited:
Upvote 0
Vladimir

Referring to your last posted code on this topic, suppose that on Sheet1 you have:
{A1,A2,A3}={3,2,1}
{B1,B2,B3}={a,b,c}

and on sheet2:
{A1,A2}={1,3}

Two questions:
(a) What is the output from your code?
(b) Is this the correct output?
 
Upvote 0
ZVI

"I haven’t real data as you are, but you can try to test VLOOKUP on Rugila test data, it will take years for me. Ok, may be the minutes. The initial code of Peter provides approx the same as VLOOKUP and it coasts more than 10 minutes of PC duty. Using the dictionary in many cases is smart enough, but not in each cases. For example, adding the new elements to more than 100000 items of the dictionary becomes slower and slower and another methods are preferable."

I'm not interestd enough to spend time testing the various suggestions.
I think this is something that can be done by the OP or someone who is interested enough.

The code to which you refer that provides approx. the same as VLOOKUP does so by looping through all the cells, which is why it takes a long time.
My suggestion to make use of a worksheet formula in a helper column does not take a long time.

Your comment about the dictionary is, of course, valid. The fastest method depends upon the data being processed as I previously wrote :-
<TABLE class=tborder style="BORDER-TOP-WIDTH: 0px" cellSpacing=1 cellPadding=6 width="100%" align=center border=0><TBODY><TR title="Post 2120318" vAlign=top><TD class=alt1>It should be borne in mind that there is no absolute fastest method for deleting rows.The speed of the various methods will depend upon the make-up of the data, etc.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
</TD></TR><TR><TD class=thead colSpan=2></TD></TR></TBODY></TABLE>
Agree that dependent slow formulas are the common problem. I meant that deleting of rows part by part can be much slower than deletion of all rows at once, but not always.

This is exactly the point I have been making.
That is why my method groups the rows for deletion into one area before deletion.


Using of full temporary column is impossible, for example, if one of the row (say the 1st one) is fully populated by the data. Inserting of column is impossible if even one cell of the rightmost column has the value. It’s rarely happens but can be.

So rarely, in fact, that it's hardly worth mentioning. And if it is likely to happen, can be taken into account at macro creation time.
Also, it's not necessary to insert a helper column - any blank column will do.


Saying "etc" I mean the expected amount of the rows which would be deleted, the limitation of 8192 areas count for SpecialCells Method, the possibility to sort data for speeding up, etc … again that "etc" :smile:

The SpecialCells limitation (which I think doesn't apply to the most recent XL version) does not affect my method because after the sort by the helper column there is only 1 area that needs to be deleted.

I don't follow your comment about "the possibility to sort for speeding up, etc.". My suggestion includes a sort so as to do exactly that.

Your use of "etc." can mean only one thing - that there are other other points that may be inappropriate. If indeed there are other such points, would it not be better to detail them ? (Particularly as the ones you have detailed so far are not appropriate. :stickouttounge: )
 
Upvote 0
I've just had a look at ZVI's code in his post #33.

I think it's difficult to improve on that.

It's fast because :

- no looping of worksheet objects
- no faster alternative to using dictionary
- avoids deleting rows by storing values to be retained in an array, writing array to worksheet, sorting to eliminate blanks

A possible problem is that the sheet 2 retained data is not in its original sequence (because of the sort), but this can be rectified by some minor code amendments.
 
Upvote 0
Vladimir

Referring to your last posted code on this topic, suppose that on Sheet1 you have:
{A1,A2,A3}={3,2,1}
{B1,B2,B3}={a,b,c}

and on sheet2:
{A1,A2}={1,3}

Two questions:
(a) What is the output from your code?
(b) Is this the correct output?

Rugila,

The answers are below:

a) The output is the original table of Sheet1 sorted by values of column A.
All rows of Sheet1 which values in the 1st column not found in Sheet2 are deleted.
{A1,A2} = {1,3}
{B1,B2} = {c,a}

b) The result is correct taking into account sorting - refer to the point above. I’ve applied "the possibility to sort data for speeding up" mentioned in my post #35. If original sorting is required then, I agree with Boller, it can be easily done.


Thank to all for the useful conversation,
Vladimir
 
Upvote 0
Vladimir, and Others,

Thanks to you also, for comments and code.

Your code seems to do the job Ok on that test data set, although not retaining the original data ordering. It runs in under 0.5 secs as you say. About 0.47 sec or 0.48 sec on my machine.

There's been a number of suggestions made on this thread, but only you have actually delivered on fast code. For some comparison I give the code below, which on my machine takes about 0.41 secs on the same dataset, some 15% faster than yours. I wouldn't read too much into this though, coz as you noted that sort of comparison is likely to depend on the datasets used.

Mine doesn't use any sorting, and the results are retained in their original order. I also rather like concise code.

Boller thought that it would be difficult to improve on your code. That seemed to me to be bad methodology - reaching conclusions before all the evidence is in. However, as the Roman satirist (was it Juvenal?) mentioned "De gustibus non disputandum", so whatever...!

Someone else mentioned using advanced filter for this problem. I wrote a bit of code trying that. Concluded that while advanced filter can be OK for small problems it slows down alarmingly with v. large datasets and other methods are much more appropriate.
Code:
Sub codestuff()
Dim m As Integer, c(), e, a, z As Object
Dim i As Long, j As Integer, p As Long, t
t = Timer
Set z = CreateObject("Scripting.Dictionary")
z.CompareMode = 1
For Each e In Sheets("Sheet2").Cells(1).CurrentRegion.Resize(, 1).Value
    z.Item(e) = Empty
Next
With Sheets("Sheet1").Cells(1).CurrentRegion
m = .Columns.Count
ReDim c(1 To .Rows.Count, 1 To m)
For Each e In .Resize(, 1).Value
    i = i + 1
    If z.Exists(e) Then
        p = p + 1
        a = .Resize(1, m).Offset(i - 1).Value
        For j = 1 To m
            c(p, j) = a(1, j)
        Next j
    End If
Next e
If p > 0 Then
    .ClearContents
    .Resize(p, m) = c
End If
End With
MsgBox Format(Timer - t, "0.000")
End Sub
 
Upvote 0

Forum statistics

Threads
1,223,919
Messages
6,175,368
Members
452,638
Latest member
Oluwabukunmi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top