Remove duplicates based on multiple criteria (ID and Date)

johnnytominaga

New Member
Joined
Apr 27, 2018
Messages
19
Hey guys!

I'm working on a project that requires merging data from multiple workbooks into a single "master" workbook.
All files have the same number of columns and have been merged properly. That means all data is in the "master" workbook.

I couldn't manage to remove the duplicates though. They need to be removed based on multiple criteria and so that cells with value are copied even if the rest of the row is removed. The criteria that defines which duplicate is to be kept is:
a) ID No. (numerical)
b) Update Date (the higher is kept)

Dataset sample:

[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Name[/TD]
[TD]Location[/TD]
[TD]Phone[/TD]
[TD]Last updated on[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Sara[/TD]
[TD]Miami[/TD]
[TD]99999999[/TD]
[TD]19/07/2018[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD][/TD]
[TD]88888888[/TD]
[TD]15/03/2015[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Josh[/TD]
[TD]Seattle[/TD]
[TD][/TD]
[TD]03/02/2015[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD]New York[/TD]
[TD][/TD]
[TD]30/09/2016[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD][/TD]
[TD]66666666[/TD]
[TD]01/10/2016[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD]Los Angeles[/TD]
[TD]88888888[/TD]
[TD]20/06/2017[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD][/TD]
[TD][/TD]
[TD]20/01/2017[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]Nicole[/TD]
[TD][/TD]
[TD]55555555[/TD]
[TD]18/11/2016[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD]77777777[/TD]
[TD]04/01/2017[/TD]
[/TR]
</tbody>[/TABLE]

Desired result:

[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Name[/TD]
[TD]Location[/TD]
[TD]Phone[/TD]
[TD]Last updated on[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Sara[/TD]
[TD]Miami[/TD]
[TD]99999999[/TD]
[TD]19/07/2018[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD]Los Angeles[/TD]
[TD]88888888[/TD]
[TD]20/06/2017[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Josh[/TD]
[TD]Seattle[/TD]
[TD]77777777[/TD]
[TD]04/01/2017[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD]New York[/TD]
[TD]66666666[/TD]
[TD]20/01/2017[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]Nicole[/TD]
[TD][/TD]
[TD]55555555[/TD]
[TD]18/11/2016[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

I tried by using the Dictionary object, but couldn't get anywhere close to what I'm looking to achieve. Also, the script is going to be used in multiple PCs, so if I could avoid needing to activate the Microsoft Scripting Runtime everytime, that would be appreciated.

I'm using Excel 2016 on Windows.

Any ideas on how I could accomplish that?

Thanks a lot in advance.


Johnny
 

Excel Facts

Save Often
If you start asking yourself if now is a good time to save your Excel workbook, the answer is Yes
Assuming data in Sheet1, columns A:E, headers in row 1, maybe this

Results in columns G:K
Code:
Sub aTest()
    Dim dic As Object, vData As Variant
    Dim i As Long, arr(1 To 4) As Variant, vKey As Variant
    Dim vAux As Variant, vResult As Variant
    
    Set dic = CreateObject("Scripting.dictionary")
    With Sheets("Sheet1")
        vData = .Range("A2:E" & .Cells(.Rows.Count, "A").End(xlUp).Row)
        For i = LBound(vData, 1) To UBound(vData, 1)
            If dic.exists(vData(i, 1)) Then
                If vData(i, 5) > dic(vData(i, 1))(4) Then
                    vAux = dic(vData(i, 1))
                    vAux(4) = vData(i, 5)
                    If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
                    If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
                    If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
                    dic(vData(i, 1)) = vAux
                End If
            Else
                arr(1) = vData(i, 2)
                arr(2) = vData(i, 3)
                arr(3) = vData(i, 4)
                arr(4) = vData(i, 5)
                dic(vData(i, 1)) = arr
            End If
        Next i
        i = 0
        .Range("G1:K1").Value = .Range("A1:E1").Value
        vResult = .Range("G2").Resize(dic.Count, 5)
        For Each vKey In dic.keys
            i = i + 1
            vResult(i, 1) = vKey
            vResult(i, 2) = dic(vKey)(1)
            vResult(i, 3) = dic(vKey)(2)
            vResult(i, 4) = dic(vKey)(3)
            vResult(i, 5) = dic(vKey)(4)
        Next vKey
        .Range("G2").Resize(dic.Count, 5) = vResult
        .Columns("H:K").AutoFit
    End With
End Sub

Hope this helps

M.
 
Last edited:
Upvote 0
Hey guys!

I'm working on a project that requires merging data from multiple workbooks into a single "master" workbook.
All files have the same number of columns and have been merged properly. That means all data is in the "master" workbook.

I couldn't manage to remove the duplicates though. They need to be removed based on multiple criteria and so that cells with value are copied even if the rest of the row is removed. The criteria that defines which duplicate is to be kept is:
a) ID No. (numerical)
b) Update Date (the higher is kept)

Dataset sample:

[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Name[/TD]
[TD]Location[/TD]
[TD]Phone[/TD]
[TD]Last updated on[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Sara[/TD]
[TD]Miami[/TD]
[TD]99999999[/TD]
[TD]19/07/2018[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD][/TD]
[TD]88888888[/TD]
[TD]15/03/2015[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Josh[/TD]
[TD]Seattle[/TD]
[TD][/TD]
[TD]03/02/2015[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD]New York[/TD]
[TD][/TD]
[TD]30/09/2016[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD][/TD]
[TD]66666666[/TD]
[TD]01/10/2016[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD]Los Angeles[/TD]
[TD]88888888[/TD]
[TD]20/06/2017[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD][/TD]
[TD][/TD]
[TD]20/01/2017[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]Nicole[/TD]
[TD][/TD]
[TD]55555555[/TD]
[TD]18/11/2016[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD]77777777[/TD]
[TD]04/01/2017[/TD]
[/TR]
</tbody>[/TABLE]

Desired result:

[TABLE="class: grid, width: 500, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]Name[/TD]
[TD]Location[/TD]
[TD]Phone[/TD]
[TD]Last updated on[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Sara[/TD]
[TD]Miami[/TD]
[TD]99999999[/TD]
[TD]19/07/2018[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Brian[/TD]
[TD]Los Angeles[/TD]
[TD]88888888[/TD]
[TD]20/06/2017[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Josh[/TD]
[TD]Seattle[/TD]
[TD]77777777[/TD]
[TD]04/01/2017[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]Peter[/TD]
[TD]New York[/TD]
[TD]66666666[/TD]
[TD]20/01/2017[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]Nicole[/TD]
[TD][/TD]
[TD]55555555[/TD]
[TD]18/11/2016[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

I tried by using the Dictionary object, but couldn't get anywhere close to what I'm looking to achieve. Also, the script is going to be used in multiple PCs, so if I could avoid needing to activate the Microsoft Scripting Runtime everytime, that would be appreciated.

I'm using Excel 2016 on Windows.

Any ideas on how I could accomplish that?

Thanks a lot in advance.


Johnny

Oh, I see now - that an MVP had answered your query....better try it.
thanks
 
Last edited:
Upvote 0
Thanks Marcelo! You're a genius.
It worked perfectly (you were right on the assumption around the headers btw).

Couple of questions:

- How could I make it work with a larger dataset though? - Meaning if I have 30, 40 columns to be merged - I tried it and I'm getting a VBA Runtime error '6' error.

- Also, what need to be changed if the ID (key) isn't in the first column?

Thanks again.


Assuming data in Sheet1, columns A:E, headers in row 1, maybe this

Results in columns G:K
Code:
Sub aTest()
    Dim dic As Object, vData As Variant
    Dim i As Long, arr(1 To 4) As Variant, vKey As Variant
    Dim vAux As Variant, vResult As Variant
    
    Set dic = CreateObject("Scripting.dictionary")
    With Sheets("Sheet1")
        vData = .Range("A2:E" & .Cells(.Rows.Count, "A").End(xlUp).Row)
        For i = LBound(vData, 1) To UBound(vData, 1)
            If dic.exists(vData(i, 1)) Then
                If vData(i, 5) > dic(vData(i, 1))(4) Then
                    vAux = dic(vData(i, 1))
                    vAux(4) = vData(i, 5)
                    If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
                    If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
                    If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
                    dic(vData(i, 1)) = vAux
                End If
            Else
                arr(1) = vData(i, 2)
                arr(2) = vData(i, 3)
                arr(3) = vData(i, 4)
                arr(4) = vData(i, 5)
                dic(vData(i, 1)) = arr
            End If
        Next i
        i = 0
        .Range("G1:K1").Value = .Range("A1:E1").Value
        vResult = .Range("G2").Resize(dic.Count, 5)
        For Each vKey In dic.keys
            i = i + 1
            vResult(i, 1) = vKey
            vResult(i, 2) = dic(vKey)(1)
            vResult(i, 3) = dic(vKey)(2)
            vResult(i, 4) = dic(vKey)(3)
            vResult(i, 5) = dic(vKey)(4)
        Next vKey
        .Range("G2").Resize(dic.Count, 5) = vResult
        .Columns("H:K").AutoFit
    End With
End Sub

Hope this helps

M.
 
Upvote 0
Try to understand the code above and adapt it for more columns.
The ID column is used as the Keys of the dictionary.
The values of other columns are stored in an array that is placed as the Item correspondent to each key.

M.
 
Upvote 0
Right. I've made a few adjustments so that I can have, for example, 27 (A to AA) columns with IDs being in column B.

This is how it now looks like:

Sub aTest()

Dim dic As Object


Dim vData As Variant
Dim i As Long
Dim arr(1 To 25) As Variant
Dim vKey As Variant
Dim vAux As Variant
Dim vResult As Variant

Set dic = CreateObject("Scripting.dictionary")

With Sheets("Sheet1")
vData = .Range("A2:Z" & .Cells(.Rows.Count, "A").End(xlUp).Row)
For i = LBound(vData, 1) To UBound(vData, 1)
If dic.exists(vData(i, 1)) Then
If vData(i, 26) > dic(vData(i, 1))(25) Then
vAux = dic(vData(i, 1))
vAux(26) = vData(i, 27)
If vData(i, 2) <> "" Then vAux(1) = vData(i, 2)
If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
If vData(i, 5) <> "" Then vAux(4) = vData(i, 5)
If vData(i, 6) <> "" Then vAux(5) = vData(i, 6)
If vData(i, 7) <> "" Then vAux(6) = vData(i, 7)
If vData(i, 8) <> "" Then vAux(7) = vData(i, 8)
If vData(i, 9) <> "" Then vAux(8) = vData(i, 9)
If vData(i, 10) <> "" Then vAux(9) = vData(i, 10)
If vData(i, 11) <> "" Then vAux(10) = vData(i, 11)
If vData(i, 12) <> "" Then vAux(11) = vData(i, 12)
If vData(i, 13) <> "" Then vAux(12) = vData(i, 13)
If vData(i, 14) <> "" Then vAux(13) = vData(i, 14)
If vData(i, 15) <> "" Then vAux(14) = vData(i, 15)
If vData(i, 16) <> "" Then vAux(15) = vData(i, 16)
If vData(i, 17) <> "" Then vAux(16) = vData(i, 17)
If vData(i, 18) <> "" Then vAux(17) = vData(i, 18)
If vData(i, 19) <> "" Then vAux(18) = vData(i, 19)
If vData(i, 20) <> "" Then vAux(19) = vData(i, 20)
If vData(i, 21) <> "" Then vAux(20) = vData(i, 21)
If vData(i, 22) <> "" Then vAux(21) = vData(i, 22)
If vData(i, 23) <> "" Then vAux(22) = vData(i, 23)
If vData(i, 24) <> "" Then vAux(23) = vData(i, 24)
If vData(i, 25) <> "" Then vAux(24) = vData(i, 25)
If vData(i, 26) <> "" Then vAux(25) = vData(i, 26)
dic(vData(i, 1)) = vAux
End If
Else
arr(1) = vData(i, 2)
arr(2) = vData(i, 3)
arr(3) = vData(i, 4)
arr(4) = vData(i, 5)
arr(5) = vData(i, 6)
arr(6) = vData(i, 7)
arr(7) = vData(i, 8)
arr(8) = vData(i, 9)
arr(9) = vData(i, 10)
arr(10) = vData(i, 11)
arr(11) = vData(i, 12)
arr(12) = vData(i, 13)
arr(13) = vData(i, 14)
arr(14) = vData(i, 15)
arr(15) = vData(i, 16)
arr(16) = vData(i, 17)
arr(17) = vData(i, 18)
arr(18) = vData(i, 19)
arr(19) = vData(i, 20)
arr(20) = vData(i, 21)
arr(21) = vData(i, 22)
arr(22) = vData(i, 23)
arr(23) = vData(i, 24)
arr(24) = vData(i, 25)
arr(25) = vData(i, 26)
dic(vData(i, 1)) = arr
End If
Next i
i = 0
.Range("AG1:BF1").Value = .Range("A1:Z1").Value
vResult = .Range("AG2").Resize(dic.Count, 26)
For Each vKey In dic.keys
i = i + 1
vResult(i, 1) = vKey
vResult(i, 2) = dic(vKey)(1)
vResult(i, 3) = dic(vKey)(2)
vResult(i, 4) = dic(vKey)(3)
vResult(i, 5) = dic(vKey)(4)
vResult(i, 6) = dic(vKey)(5)
vResult(i, 7) = dic(vKey)(6)
vResult(i, 8) = dic(vKey)(7)
vResult(i, 9) = dic(vKey)(8)
vResult(i, 10) = dic(vKey)(9)
vResult(i, 11) = dic(vKey)(10)
vResult(i, 12) = dic(vKey)(11)
vResult(i, 13) = dic(vKey)(12)
vResult(i, 14) = dic(vKey)(13)
vResult(i, 15) = dic(vKey)(14)
vResult(i, 16) = dic(vKey)(15)
vResult(i, 17) = dic(vKey)(16)
vResult(i, 18) = dic(vKey)(17)
vResult(i, 19) = dic(vKey)(18)
vResult(i, 20) = dic(vKey)(19)
vResult(i, 21) = dic(vKey)(20)
vResult(i, 22) = dic(vKey)(21)
vResult(i, 23) = dic(vKey)(22)
vResult(i, 24) = dic(vKey)(23)
vResult(i, 25) = dic(vKey)(24)
vResult(i, 26) = dic(vKey)(25)

Next vKey
.Range("AG2").Resize(dic.Count, 26) = vResult
' .Columns("H:K").AutoFit
End With
End Sub


The problem I'm facing is that I'm getting an Run-time Error 6 'Overflow', when run the line below:

vData = .Range("B2:I" & .Cells(.Rows.Count, "B").End(xlUp).Row)

I can't seem to get passed that to test the rest of the code.
Do you have a suggestions around how to solve that error?

Thanks a lot.



Try to understand the code above and adapt it for more columns.
The ID column is used as the Keys of the dictionary.
The values of other columns are stored in an array that is placed as the Item correspondent to each key.

M.
 
Upvote 0
Some adjustments

1. You have 27 columns, A from AA, being one column (B) reserved for the key then you need an array with 26 elements to store the values of the other columns
Dim arr(1 to 26)

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "A").End(xlUp).Row)

3. The key is column B so
If dic.exists(vData(i, 2)) Then

4. Store the other values in arr
arr(1) = vdata(i,1)
'skip ID column
arr(2) = vdata(i,3)
arr(3) = vdata(i,4)
....
arr(26)= vdata(i,27)

5. vResult must have 27 columns so
vResult = .Range("AG2").Resize(dic.Count, 27)

Hope this helps

M.
 
Upvote 0
Edit

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row)

M.
 
Upvote 0
Another adjustment

If vData(i, 27) > dic(vData(i, 1))(26) Then
vAux = dic(vData(i, 2))
vAux(26) = vData(i, 27)
If vData(i, 1) <> "" Then vAux(1) = vData(i, 1)
'skip ID column
If vData(i, 3) <> "" Then vAux(2) = vData(i, 3)
If vData(i, 4) <> "" Then vAux(3) = vData(i, 4)
...
If vData(i, 26) <> "" Then vAux(25) = vData(i, 26)

M.
 
Last edited:
Upvote 0
It worked. Thanks for going through it line by line. Truly appreciate it.
I'm running a few tests and depending on the number of rows or columns, I'm still getting a Error 6 Overflow message (line: vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row).

Is there any way or other method to avoid that?

Thanks.



Edit

2, vData should comprise all the columns so
vData = .Range("A2:AA" & .Cells(.Rows.Count, "B").End(xlUp).Row)

M.
 
Upvote 0

Forum statistics

Threads
1,223,885
Messages
6,175,186
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top