Remove Duplicates VBA Code

alforc17

New Member
Joined
Aug 7, 2015
Messages
6
I have the following code, but it is not working. Any ideas?

Sub deleteduplicates()
Dim lastrow As Long
lastrow = Sheets("CategoryMaster").Range("A" & Rows.Count).End(xlUp).row
Sheets("CategoryMaster").Range("A1" & ":" & "G" & lastrow).CurrentRegion.RemoveDuplicates _ Columns:=1, Header:=xlYes
End Sub
 
Hi Peter,

Did you modify the code for your use case? If you didn't then it's not surprising it didn't work for you. I did find a small bug though which might have caused it to miss last couple of rows in the data set!
I should have added +1 to iRowCount to avoid missing last row. Thanks for making me look at it more deeply! :-D Works fine now all the way through.
Code:
'Loop through all rows based on row count
For iCount = 2 To iRowCount + 1

You do always need to modify for your specific use case. You can choose to look at all the columns or just single column when doing the comparison. In my case I was not checking all columns in my dataset, just certain ones. The ALL had to match for it to determine they were duplicate data...even if other columns in the same data set had different values.

I will simplify the code here and use a single column to check for your examples. The code is customizable and not meant to be run on any data set without first knowing what you want to compare. It's easy to modify though, just add or remove columns to check and change the column referencing depending on where the columns are in your data.

Checking only 1 column...say if I only care if HDR4 columns have same data.

Code:
Public Sub Compare()Dim iCount As Double
Dim iRowCount As Double
Dim iRefRow As Double
Dim iRefCol As Integer
Dim iRow As Double
Dim iCol As Integer


Dim sStringRef1 As String
'Dim sStringRef2 As Boolean
'Dim sStringRef3 As Boolean
'Dim sStringRef4 As Boolean
'Dim sStringRef5 As Boolean
'Dim sStringRef6 As Boolean
'Dim sStringRef7 As Boolean
'Dim sStringRef8 As Boolean
'Dim sStringRef9 As Boolean


Dim sStringChk1 As String
'Dim sStringChk2 As Boolean
'Dim sStringChk3 As Boolean
'Dim sStringChk4 As Boolean
'Dim sStringChk5 As Boolean
'Dim sStringChk6 As Boolean
'Dim sStringChk7 As Boolean
'Dim sStringChk8 As Boolean
'Dim sStringChk9 As Boolean


iRow = 2
iCol = 1
iRefCol = 4
iRefRow = 0
iCount = 0
iRowCount = 0


Application.ScreenUpdating = True
'Count number of rows
Do Until Cells(iRow, iCol) = ""
    DoEvents
    iRowCount = iRowCount + 1
    iRow = iRow + 1
Loop
iRow = 2    'Reset index row


Application.ScreenUpdating = False
'Loop through all rows based on row count
For iCount = 2 To iRowCount + 1
    DoEvents
    iRefRow = iCount
    'Assign Strings to reference variables (to check for duplicates)
    sStringRef1 = Cells(iRefRow, iRefCol)
    'sStringRef2 = Cells(iRefRow, iRefCol + 1)
    'sStringRef3 = Cells(iRefRow, iRefCol + 2)
    'sStringRef4 = Cells(iRefRow, iRefCol + 3)
    'sStringRef5 = Cells(iRefRow, iRefCol + 4)
    'sStringRef6 = Cells(iRefRow, iRefCol + 5)
    'sStringRef7 = Cells(iRefRow, iRefCol + 6)
    'sStringRef8 = Cells(iRefRow, iRefCol + 7)
    'sStringRef9 = Cells(iRefRow, iRefCol + 8)
   
    'Get ready to check the next rows...
    iRow = iRefRow + 1
    
    'Keep checking and comparing all the other rows to the reference row above.
    Do Until Cells(iRow, iCol) = ""
        Application.StatusBar = "Checking Row: " & iRefRow & " ... against Row: " & iRow
        
        DoEvents
        sStringChk1 = Cells(iRow, iRefCol)
        'sStringChk2 = Cells(iRow, iRefCol + 1)
        'sStringChk3 = Cells(iRow, iRefCol + 2)
        'sStringChk4 = Cells(iRow, iRefCol + 3)
        'sStringChk5 = Cells(iRow, iRefCol + 4)
        'sStringChk6 = Cells(iRow, iRefCol + 5)
        'sStringChk7 = Cells(iRow, iRefCol + 6)
        'sStringChk8 = Cells(iRow, iRefCol + 7)
        'sStringChk9 = Cells(iRow, iRefCol + 8)
        'Compare all variables, if all match then remove the row being checked. (I removed the ones I commented out...)
        If sStringChk1 = sStringRef1 Then
            Rows(iRow).Select
            Selection.Delete Shift:=xlUp
        End If
        iRow = iRow + 1
    Loop
Next
Application.ScreenUpdating = True
Application.StatusBar = "Ready"
MsgBox "Done!"


End Sub
 
Upvote 0

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK
Don't forget as well that the variable types need to match when the compare is done. The orginal code had a lot of boolean variable types setup...which you didn't have in your data set. You should modify to suit your need.
 
Upvote 0
Checking only 1 column...say if I only care if HDR4 columns have same data.
So if your new code is meant to remove duplicates in HDR4 (column D) I'm wondering if you ran the code on either of the samples I gave? It doesn't appear to work correctly for me. Here are the results I get after running your new code.

Test1: Only one of the 3 identical column D rows removed.

Excel Workbook
ABCDE
1HDR1HDR2HDR3HDR4HDR5
2xxx9c9
3xxx99
4xxx10c10
5xxx10c10
6
Test1



Test 2: Nothing removed despite 6 rows having the same thing in column D

Excel Workbook
ABCDE
1HDR1HDR2HDR3HDR4HDR5
2xxx9c9
3xxx99
4xxx10c10
5xx109
6xxx10c10
7xxx10c10
8xxx10c10
9xxx10c10
10xxx10c10
11
Test2




Don't forget as well that the variable types need to match when the compare is done.
So what happens if a column has a number of different data types?
 
Upvote 0
Interesting this thread is still going.

After my somewhat negative view of Microsoft's RemoveDuplicates in Post# 8, maybe a good idea to be more positive and produce my own alternative.

The code at the end of this post seems to do what Excel's RemoveDuplicates does, but also works in cases where the Microsoft contribution doesn't.

Regarding relative speeds, which seem often important to users, I tested both my code and the Microsoft attempt on a dataset of 100,000 rows generated by the below testdata code.

After the computer was told which columns to use in each case, the actual removing duplicates running time was (for all 4 columns):

Excel RemoveDuplicates 1.38 secs
Kalak code 1.62 secs

Any comments or criticisms welcomed.
Code:
Sub testdata()

ActiveSheet.UsedRange.Clear
Dim n As Long, m As Long
n = 100000: m = 4
With Range("A1").Resize(n, m)
    .Cells = "=char(randbetween(1,26)+64)"
    .Value = .Value
End With
End Sub
Code:
Sub RemoveDuplicates_kalak()

Dim Dic As Object
Dim s(), arr, a, c, spl, x
Dim i As Long, r As Long, rs As Long

Set Dic = CreateObject("scripting.dictionary")
rws = Cells.Find("*", , , , xlByRows, xlPrevious).Row
cls = Cells.Find("*", , , , xlByColumns, xlPrevious).Column + 1
a = Range("A1").Resize(rws, cls)
ReDim s(1 To rws, 1 To 1)

arr = InputBox("Enter comma-separated column numbers." & vbLf & _
    "Or just type  all  to use all columns")
If arr = vbNullString Then Exit Sub

With Application
    .ScreenUpdating = False
    .Calculation = xlManual
    spl = Split(arr, ",")
    
    For i = 1 To rws
        x = vbNullString
        If arr = "all" Then
            For c = 1 To cls - 1
                x = Join(Array(x, a(i, c)), Chr(2))
            Next c
        Else
            For c = 0 To UBound(spl)
                x = Join(Array(x, a(i, spl(c))), Chr(2))
            Next c
        End If
        If Dic(x) = 1 Then s(i, 1) = 1: r = r + 1
        Dic(x) = 1
    Next i
        
    If r > 0 Then
        Cells(cls).Resize(rws) = s
        Range("A1").Resize(rws, cls).Sort Cells(cls)
        Range("A1").Resize(r, cls).Delete xlUp
    End If
    
    .Calculation = xlAutomatic
    .ScreenUpdating = True
End With

End Sub
 
Last edited:
Upvote 0
Any comments or criticisms welcomed.
I'm not championing Excel's Remove Duplicates nor going to get in to continually testing code for this but this code doesn't quite do what the built-in system does. For example the built-in remove duplicates can act on a specified range.

In the sample below Excel's remove Duplicates can act on the yellow section to reduce it to data in A1:C3, leaving column F and row 14 untouched & unmoved. Your code (with "1,2,3" in the InputBox) didn't do that.

If you want to further work on your code I would also suggest taking the case-sensitivity out of the InputBox result. The first time I accidentally typed "All" and the code fell over.

Excel Workbook
ABCDEFG
1HDR1HDR2HDR3a
22xxb
32xxc
42xxd
52xxe
62xxf
72yxg
82yxh
92yxi
102yxj
112yxk
12l
13m
142xxxxn
15
Test3
 
Upvote 0
@Peter_SSs
Thanks for your comments.
I wasn't really trying to duplicate all aspects of Excel's RemoveDuplicates. The aim was actually NOT to do that for reasons stated earlier in the thread.
I just felt somewhat uneasy about this inbuilt Excel function with a bug being recommended by so many contributors, here and elsewhere.
It's easy enough to modify my own code so (say) cols 1,2,3 intersect some current region if anyone were interested.
And your comment on all vs All is well taken. Although actually I'm very unlikely to do anything more with it.
However, thanks again for your comments. I really appreciate your taking the time.
 
Upvote 0
I would prefer a routine that can be called from another sub. Here's one that's very lightly tested along with a calling routine that sets the arguments with Peter's Test 3 layout from post # 15 in mind.
Code:
Sub RemoveDups(R As Range, Cols As Variant, Hdrs As Boolean)
'Has to be called from another routine using three arguments:
'remove duplicate entries from range R, enter Cols as array to indicate which columns (relative to range R)
'are compared, Hdrs is true if R has Headers
Dim d As Object, Vin As Variant, i As Long, c As Long, x As Variant, ct As Long
Dim dRws As Range
Set d = CreateObject("Scripting.Dictionary")
Vin = R.Value
If Hdrs Then
       n = 2
Else
       n = 1
End If
For i = n To UBound(Vin, 1)
       x = vbNullString
       For c = LBound(Cols) To UBound(Cols)
              x = Join(Array(x, Vin(i, c + 1)), Chr(2))
       Next c
       If Not d.exists(x) Then
              ct = ct + 1
              d.Add x, ct
       Else
              If dRws Is Nothing Then
                     Set dRws = R.Rows(i)
              Else
                     Set dRws = Union(dRws, R.Rows(i))
              End If
       End If
Next i
If Not dRws Is Nothing Then
       dRws.Delete shift:=xlUp
       Select Case n
              Case 1: MsgBox UBound(Vin, 1) - ct & " duplicates removed"
              Case 2: MsgBox UBound(Vin, 1) - 1 - ct & " duplicates removed"
       End Select
Else
       MsgBox "No dupicate entries found in range: " & R.Address(0, 0)
End If
End Sub
Sub test()
'adjust arguments to suit
Call RemoveDups(Range("A1").CurrentRegion, Array(1, 2, 3), True)
End Sub
 
Upvote 0
For anyone preferring a called code to a standalone code and current region to the used range that's great, and I hope good use can be made of it.

Just for interest, using the testdata code in post#14, but with 15,000 rows instead of 100,000, and looking at duplicates in only Cols 1, 2 and 3, I checked run times for the codes, which were:

[TABLE="width: 302"]
<tbody>[TR]
[TD]Excel RemoveDuplicates[/TD]
[TD="align: right"]0.11 secs[/TD]
[/TR]
[TR]
[TD]kalak code (slightly modified)[/TD]
[TD="align: right"]0.16 secs[/TD]
[/TR]
[TR]
[TD]JoeMo code[/TD]
[TD="align: right"]57.58 secs[/TD]
[/TR]
</tbody>[/TABLE]

All 3 of these codes gave the same duplicates-removed results.
 
Upvote 0
For anyone preferring a called code to a standalone code and current region to the used range that's great, and I hope good use can be made of it.

Just for interest, using the testdata code in post#14, but with 15,000 rows instead of 100,000, and looking at duplicates in only Cols 1, 2 and 3, I checked run times for the codes, which were:

[TABLE="width: 302"]
<tbody>[TR]
[TD]Excel RemoveDuplicates[/TD]
[TD="align: right"]0.11 secs[/TD]
[/TR]
[TR]
[TD]kalak code (slightly modified)[/TD]
[TD="align: right"]0.16 secs[/TD]
[/TR]
[TR]
[TD]JoeMo code[/TD]
[TD="align: right"]57.58 secs[/TD]
[/TR]
</tbody>[/TABLE]

All 3 of these codes gave the same duplicates-removed results.
I didn't attempt to optimize the code I posted for speed. My objective was simply to produce an illustration of a routine that could be called from another subroutine w/o the need for user input. But, since you have made the speed comparison, I have modified the code to avoid deleting a large number of non-contiguous rows, which is very time consuming.

Using your post #14 test with 15,000 rows and looking for duplicates across the first 3 columns of the range, I get an execution time of 0.3 seconds on my computer. Using Excel's RemoveDuplicates on the same data set the time is 0.15 seconds. Although my code is about 2x slower it's still fast enough to satisfy most needs.
Code:
Sub RemoveDups2(R As Range, Cols As Variant, Hdrs As Boolean)
'remove duplicate entries from range R, enter Cols as array to indicate which fields are compared, Hdrs is true if R has Headers
Dim d As Object, Vin As Variant, i As Long, c As Long, x As Variant, ct As Long
Dim dRws As Long

With Application
    .Calculation = xlCalculationManual
    .ScreenUpdating = False
End With
R.Columns(R.Columns.Count).Offset(0, 1).EntireColumn.Insert
Set d = CreateObject("Scripting.Dictionary")
Vin = R.Value
If Hdrs Then
    n = 2
Else
    n = 1
End If
For i = n To UBound(Vin, 1)
    x = vbNullString
    For c = LBound(Cols) To UBound(Cols)
        x = Join(Array(x, Vin(i, c + 1)), Chr(2))
    Next c
    If Not d.exists(x) Then
        ct = ct + 1
        d.Add x, ct
    Else
        dRws = dRws + 1
        R.Rows(i).Cells(1, R.Columns.Count + 1).Value = 1
    End If
Next i
If dRws > 0 Then
    R.Rows(1).Cells(1, R.Columns.Count + 1).Sort key1:=R.Rows(1).Cells(1, R.Columns.Count + 1), order1:=xlAscending
    R.Rows(1).Resize(dRws).Delete shift:=xlUp
    R.Columns(R.Columns.Count).Offset(0, 1).EntireColumn.Delete
    R.Rows(R.Rows.Count).Offset(1, 0).Resize(dRws).Insert shift:=xlDown
    Select Case n
           Case 1: MsgBox UBound(Vin, 1) - ct & " duplicates removed"
           Case 2: MsgBox UBound(Vin, 1) - 1 - ct & " duplicates removed"
    End Select
Else
       MsgBox "No dupicate entries found in range: " & R.Address(0, 0)
End If
With Application
    .Calculation = xlCalculationAutomatic
    .ScreenUpdating = True
End With
End Sub
Sub test2()
Call RemoveDups2(Range("A1").CurrentRegion, Array(1, 2, 3), False)
End Sub
 
Last edited:
Upvote 0
My reason for entering this thread in the first place was to show (by counter-example) that the automatic recommendation of Excel's RemoveDuplicates wasn't necessarily a very good idea.

Later, I did think it a good idea to produce an alternative rather than just a negative view. My main aim was actually a personal intellectual one, to see if I could write a VBA code that ran faster than the Excel RemoveDuplicates in datasets where the latter did work correctly.
And so the thread developed ...

To provide a more challenging test, I modified the TestData code of post#14 to the one below, to provide 200,000 rows with 4 columns and duplication of cols 1, 2 and 4 to be the rows deletion criterion.
Rich (BB code):
Sub TestData()
ActiveSheet.UsedRange.Clear
Dim n As Long, m As Long
n = 200000: m = 4
With Range("A1").Resize(n, m)
    .Cells = "=char(randbetween(1,10)+64)&char(randbetween(1,10)+64)"
    .Value = .Value
End With
End Sub
Both Excel's RemoveDuplicates and my own code as below removed the same duplicates and gave the same result on the identical data. I also tried JoeMo's faster version on the identical data, using the line:
"Call RemoveDups2(Range("A1").CurrentRegion, Array(1, 2, 4), False)"

Timed results.
[TABLE="width: 218"]
<tbody>[TR]
[TD]kalak_xp[/TD]
[TD]4.04 secs[/TD]
[/TR]
[TR]
[TD]excel rem dups[/TD]
[TD]4.69 secs[/TD]
[/TR]
[TR]
[TD]joemo 2[/TD]
[TD]6.66 secs[/TD]
[/TR]
</tbody>[/TABLE]

The joemo 2 code timed well on that sized dataset, but gave different results from the other two using the identical input data.
My code and Excel's RemoveDuplicates both identified and removed duplicates from Joe's output but not from each other's outut, and Joe's code removed duplicates from the output of the other two.
Interesting, I suppose.
Rich (BB code):
Sub RemoveDuplicates_kalak_xp()

Dim Dic As Object
Dim s(), a As Range, arr, q, u, x
Dim i As Long, c As Long, cls As Long, rws As Long

arr = Array(1, 2, 4) 'array to specify columns for duplicates

Set Dic = CreateObject("scripting.dictionary")
rws = Cells.Find("*", , , , xlByRows, xlPrevious).Row
cls = Cells.Find("*", , , , xlByColumns, xlPrevious).Column + 1
Set a = Range("A1").Resize(rws, cls)
ReDim s(1 To rws, 1 To 1)

With Application
    .ScreenUpdating = False
    .Calculation = xlManual
    
    For Each x In arr
        q = q & "&char(2)&" & a.Columns(x).Address
    Next x
    x = Evaluate(Mid(q, 2))

    For i = 1 To rws
        If Not Dic(x(i, 1)) Then Dic(x(i, 1)) = True _
            Else s(i, 1) = 1: c = c + 1
    Next i

    If c > 0 Then
        a(cls).Resize(rws) = s
        a.Sort a.Columns(cls)
        a.Resize(c).Delete xlUp
    End If
    
    .Calculation = xlAutomatic
    .ScreenUpdating = True
End With

End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,297
Messages
6,184,137
Members
453,218
Latest member
dawngrogan2207

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top