Remove Duplicates VBA Code

alforc17

New Member
Joined
Aug 7, 2015
Messages
6
I have the following code, but it is not working. Any ideas?

Sub deleteduplicates()
Dim lastrow As Long
lastrow = Sheets("CategoryMaster").Range("A" & Rows.Count).End(xlUp).row
Sheets("CategoryMaster").Range("A1" & ":" & "G" & lastrow).CurrentRegion.RemoveDuplicates _ Columns:=1, Header:=xlYes
End Sub
 
My reason for entering this thread in the first place was to show (by counter-example) that the automatic recommendation of Excel's RemoveDuplicates wasn't necessarily a very good idea.

Later, I did think it a good idea to produce an alternative rather than just a negative view. My main aim was actually a personal intellectual one, to see if I could write a VBA code that ran faster than the Excel RemoveDuplicates in datasets where the latter did work correctly.
And so the thread developed ...

To provide a more challenging test, I modified the TestData code of post#14 to the one below, to provide 200,000 rows with 4 columns and duplication of cols 1, 2 and 4 to be the rows deletion criterion.
Code:
Sub TestData()
ActiveSheet.UsedRange.Clear
Dim n As Long, m As Long
n = 200000: m = 4
With Range("A1").Resize(n, m)
    .Cells = "=char(randbetween(1,10)+64)&char(randbetween(1,10)+64)"
    .Value = .Value
End With
End Sub
Both Excel's RemoveDuplicates and my own code as below removed the same duplicates and gave the same result on the identical data. I also tried JoeMo's faster version on the identical data, using the line:
"Call RemoveDups2(Range("A1").CurrentRegion, Array(1, 2, 4), False)"

Timed results.
[TABLE="width: 218"]
<tbody>[TR]
[TD]kalak_xp[/TD]
[TD]4.04 secs[/TD]
[/TR]
[TR]
[TD]excel rem dups[/TD]
[TD]4.69 secs[/TD]
[/TR]
[TR]
[TD]joemo 2[/TD]
[TD]6.66 secs[/TD]
[/TR]
</tbody>[/TABLE]

The joemo 2 code timed well on that sized dataset, but gave different results from the other two using the identical input data.
My code and Excel's RemoveDuplicates both identified and removed duplicates from Joe's output but not from each other's outut, and Joe's code removed duplicates from the output of the other two.
Interesting, I suppose.
As I said in post #17, the code I wrote was lightly tested, so thanks for the additional testing which showed me where I was in error. A very slight modification provides a fix. With the modified code below, I ran your test of 200,000 rows using the same procedure you describe in the quote above.

Now both your code and my code (below) produce identical results and comparable execution times. After running either on the same data set, using the other code or Excel's Remove Duplicates on the output reveals no duplicates remain.
Code:
Sub RemoveDups3(R As Range, Cols As Variant, Hdrs As Boolean)
'remove duplicate entries from range R, enter Cols as array to indicate which fields are compared, Hdrs is true if R has Headers
Dim d As Object, Vin As Variant, i As Long, c As Variant, x As Variant, ct As Long
Dim dRws As Long, Srt() As Variant

With Application
    .Calculation = xlCalculationManual
    .ScreenUpdating = False
End With
R.Columns(R.Columns.Count).Offset(0, 1).EntireColumn.Insert
Set d = CreateObject("Scripting.Dictionary")
Vin = R.Value
ReDim Srt(1 To R.Rows.Count, 1 To 1)
If Hdrs Then
    n = 2
Else
    n = 1
End If
For i = n To UBound(Vin, 1)
    x = vbNullString
    For Each c In Cols
        x = Join(Array(x, Vin(i, c)), Chr(2))
    Next c
    If Not d.exists(x) Then
        ct = ct + 1
        d.Add x, ct
        Srt(i, 1) = ""
    Else
        dRws = dRws + 1
        Srt(i, 1) = 1
        R.Rows(i).Cells(1, R.Columns.Count + 1).Value = 1
    End If
Next i
If dRws > 0 Then
    R.Columns(R.Columns.Count).Offset(0, 1).Value = Srt
    R.Rows(1).Cells(1, R.Columns.Count + 1).Sort key1:=R.Rows(1).Cells(1, R.Columns.Count + 1), order1:=xlAscending
    R.Rows(1).Resize(dRws).Delete shift:=xlUp
    R.Columns(R.Columns.Count).Offset(0, 1).EntireColumn.Delete
    R.Rows(R.Rows.Count).Offset(1, 0).Resize(dRws).Insert shift:=xlDown
    Select Case n
           Case 1: MsgBox UBound(Vin, 1) - ct & " duplicates removed"
           Case 2: MsgBox UBound(Vin, 1) - 1 - ct & " duplicates removed"
    End Select
Else
       MsgBox "No dupicate entries found in range: " & R.Address(0, 0)
End If
With Application
    .Calculation = xlCalculationAutomatic
    .ScreenUpdating = True
End With
End Sub
Sub test3()
Call RemoveDups(Range("A1").CurrentRegion, Array(1, 2, 4), False)
End Sub
 
Upvote 0

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
As I said in post #17, the code I wrote was lightly tested, so thanks for the additional testing which showed me where I was in error. A very slight modification provides a fix. With the modified code below, I ran your test of 200,000 rows using the same procedure you describe in the quote above.

Now both your code and my code (below) produce identical results and comparable execution times. After running either on the same data set, using the other code or Excel's Remove Duplicates on the output reveals no duplicates remain.
OK. I tested your new modification on TestData code, and its result seems OK this time around. Good work!
I timed the 3 codes on the same data as:

[TABLE="width: 218"]
<tbody>[TR]
[TD]kalak_xp[/TD]
[TD]4.34 secs[/TD]
[/TR]
[TR]
[TD]excel RemDups[/TD]
[TD]5.16 secs[/TD]
[/TR]
[TR]
[TD]JoeMo3[/TD]
[TD]8.78 secs[/TD]
[/TR]
</tbody>[/TABLE]

I guess whether 4.3 secs and 8.7 secs are comparable depends on perspectives and uses.

My aims in the thread were to point out a problem with Excel's RemoveDuplicates, and to see if I could write a code without that problem and which also ran faster than the Excel version. So I'll leave the thread happily enough.
 
Upvote 0
OK. I tested your new modification on TestData code, and its result seems OK this time around. Good work!
I timed the 3 codes on the same data as:

[TABLE="width: 218"]
<tbody>[TR]
[TD]kalak_xp[/TD]
[TD]4.34 secs[/TD]
[/TR]
[TR]
[TD]excel RemDups[/TD]
[TD]5.16 secs[/TD]
[/TR]
[TR]
[TD]JoeMo3[/TD]
[TD]8.78 secs[/TD]
[/TR]
</tbody>[/TABLE]

I guess whether 4.3 secs and 8.7 secs are comparable depends on perspectives and uses.

My aims in the thread were to point out a problem with Excel's RemoveDuplicates, and to see if I could write a code without that problem and which also ran faster than the Excel version. So I'll leave the thread happily enough.
Thanks for confirming my observations. I'll accept the speed decrement in exchange for a callable routine that works on specific ranges within the used range or the entire used range. Your code still does not handle Peter's test layout in post #15 properly, but I understand your objective was met. Thanks again for a useful exchange!
 
Upvote 0
Thanks for confirming my observations. I'll accept the speed decrement in exchange for a callable routine that works on specific ranges within the used range or the entire used range. Your code still does not handle Peter's test layout in post #15 properly, but I understand your objective was met. Thanks again for a useful exchange!
Both of those things you mention seemed to me pretty trivial, easily allowed for if wanted and I had no motivation to look at them anyway.
I wasn't even aware that anyone had specifically asked for these, other than (possibly) yourself or Peter. Were you for some reason expecting something more than I actually did? If so, why?
I did state my motivation for contributing to this thread, which was to do just what I did, and without errors, viz. pointing out a problem with the remove duplicates technique suggested at the start of the thread, and trying to write an error-free code to remove duplicates faster than Excel's RemoveDuplicates. :)
 
Upvote 0

Forum statistics

Threads
1,225,298
Messages
6,184,139
Members
453,218
Latest member
dawngrogan2207

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top