VBA Code to remove partial duplicate entries from data based on set criteria

fishep6

New Member
Joined
Feb 10, 2014
Messages
43
[TABLE="width: 855"]
<tbody>[TR]
[TD="colspan: 8"]Hi

What I am trying to achieve is a code that will remove semi duplicate rows of data from the attached spread sheet table, (Table 1) and then leave me the results as in (Table 2)
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 5"]I am not sure whether I need to concatenate as part of the formula or not in order to get the objective to work.

ORIGINAL DATA TABLE 1

[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]First Name
[/TD]
[TD]Surname
[/TD]
[TD]Date Of Birth
[/TD]
[TD]Date Of Assessment
[/TD]
[TD]Gender
[/TD]
[TD]Smoker
[/TD]
[TD]Health
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Dave
[/TD]
[TD]Smith
[/TD]
[TD]02/06/1977
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]17-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rose
[/TD]
[TD]Frank
[/TD]
[TD]12/05/1981
[/TD]
[TD]14-May
[/TD]
[TD]Female
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Ben
[/TD]
[TD]Jakes
[/TD]
[TD]06/07/1988
[/TD]
[TD]16-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Bad
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Josh
[/TD]
[TD]West
[/TD]
[TD]12/02/1965
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]22-May
[/TD]
[TD]Male
[/TD]
[TD]Unknown
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Shariq
[/TD]
[TD]Najeeb
[/TD]
[TD]04/11/1976
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]Columns (A,B and C) are the columns I want to check for duplicates, theColumn (F) will be the only real variable I am concerned with.
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]So in the example we can see that "Adam" and "Rita" each have a duplicate entry, however there they have differing values in column F which is what I am concerned with and they also have a differing value in column D which I am not concerned about.
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]What I want to do effectively, when a duplicate is listed is keep the first line that is entered and remove the duplicate line. However I also want the line that is remaining to have the following changes to the Smoker column.
[/TD]
[/TR]
[TR]
[TD="colspan: 8"]I want the smoker column to then state the word "ERROR", followed by the text that is in both smoker columns for the duplicate entries, so you will see in the second table this is how I want the results to appear

IDEAL OUTCOME TABLE 2
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]First Name
[/TD]
[TD]Surname
[/TD]
[TD]Date Of Birth
[/TD]
[TD]Date Of Assessment
[/TD]
[TD]Gender
[/TD]
[TD]Smoker
[/TD]
[TD]Health
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Dave
[/TD]
[TD]Smith
[/TD]
[TD]02/06/1977
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]ERROR "Yes","No"
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rose
[/TD]
[TD]Frank
[/TD]
[TD]12/05/1981
[/TD]
[TD]14-May
[/TD]
[TD]Female
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Ben
[/TD]
[TD]Jakes
[/TD]
[TD]06/07/1988
[/TD]
[TD]16-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Bad
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Josh
[/TD]
[TD]West
[/TD]
[TD]12/02/1965
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]ERROR "Yes","Unknown"
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Shariq
[/TD]
[TD]Najeeb
[/TD]
[TD]04/11/1976
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]So you will see my original table has reduced from 9 lines to 7

You will note for "Adam" there is now only one entry, colum (D) is using the date from the first entry in the original table, and then in column F is contains the text "ERROR" followed by the 2 duplicate entries from the previous table which were "Yes" and "No"
[/TD]
[/TR]
[TR]
[TD="colspan: 8"]You will note for "Rita" there is now only one entry, colum (D) is using the date from the first entry in the original table, and then in column F is contains the text "ERROR" followed by the 2 duplicate entries from the previous table which were "No" and "Unknown"
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]My data table is a lot larger than the sample but hopefully this is possible? I am happy to get the data I want in table 2 to open on a separate tab on my spreadsheet if that is an easier approach?
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 2"]Thank you in advance
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
 
Last edited:

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Try this for results on sheet2:-
Code:
[COLOR="Navy"]Sub[/COLOR] MG22May03
[COLOR="Navy"]Dim[/COLOR] Ray [COLOR="Navy"]As[/COLOR] Variant, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Txt [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Ac [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant
Ray = Cells(1).CurrentRegion.Resize(, 7)
ReDim nRay(1 To UBound(Ray, 1), 1 To UBound(Ray, 2))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] n = 1 To UBound(Ray, 1)
Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3)
    [COLOR="Navy"]If[/COLOR] Not .Exists(Txt) [COLOR="Navy"]Then[/COLOR]
      c = c + 1
      [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
         nRay(c, Ac) = Ray(n, Ac)
      [COLOR="Navy"]Next[/COLOR] Ac
        .Add Txt, Array(Ray(n, 6), c)
    [COLOR="Navy"]Else[/COLOR]
       Q = .Item(Txt)
        Q(0) = Q(0) & ", """ & Ray(n, 6) & ""
       nRay(Q(1), 6) = "Error  """ & Q(0) & """"
    .Item(Txt) = Q
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]With[/COLOR] Sheets("Sheet2").Range("A1").Resize(c, 7)
  .Value = nRay
  .Columns.AutoFit
  .Borders.Weight = 2
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]With[/COLOR]
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Thank you that is amazing

I have a couple of tweaks I need to do if possible please?

1) Rather than columns A,B and C being the duplicates it is actually columns B,C and D, presumably I just change Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3) to Txt = Ray(n, 2) & Ray(n, 3) & Ray(n, 4)???

2) Is there a way to get the formula to ignore this if the value in column 6 (F) is "NULL"???

3) What would I need toad to the end of the code in order to get a pop up box saying that the code has run and is completed?

Many thanks
 
Upvote 0
Q(1) Basically Yes , but maybe a bit more required, I'll send some new code !!!
Q(2) Do you mean, Do NOT include this line in sheet2 results if column(6) has the actual string "Null" in it. ???
Q(3) No problem with this !!!!
 
Upvote 0
1) brilliant
2) - no if it has Null in the F column then the code just ignores it and takes the line over to the second sheet without trying to de duplicate it.
3) Thank you so much
 
Upvote 0
Try this for data starting column "B1" and results in sheet2 starting "A1".
Code:
[COLOR="Navy"]Sub[/COLOR] MG22May23
[COLOR="Navy"]Dim[/COLOR] Ray [COLOR="Navy"]As[/COLOR] Variant, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Txt [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Ac [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range
Ray = Cells(1).CurrentRegion.Offset(, 1).Resize(, 8)
ReDim nRay(1 To UBound(Ray, 1), 1 To UBound(Ray, 2))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] n = 1 To UBound(Ray, 1)
Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3)
    [COLOR="Navy"]If[/COLOR] Not .Exists(Txt) [COLOR="Navy"]Then[/COLOR]
      c = c + 1
      [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
         nRay(c, Ac) = Ray(n, Ac)
      [COLOR="Navy"]Next[/COLOR] Ac
        .Add Txt, Array(Ray(n, 6), c)
    [COLOR="Navy"]Else[/COLOR]
       Q = .Item(Txt)
        [COLOR="Navy"]If[/COLOR] Q(0) = "Null" [COLOR="Navy"]Then[/COLOR]
            c = c + 1
            [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
                nRay(c, Ac) = Ray(n, Ac)
            [COLOR="Navy"]Next[/COLOR] Ac
        [COLOR="Navy"]Else[/COLOR]
            Q(0) = Q(0) & ", """ & Ray(n, 6) & ""
            nRay(Q(1), 6) = "Error  """ & Q(0) & """"
        [COLOR="Navy"]End[/COLOR] If
    .Item(Txt) = Q
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]With[/COLOR] Sheets("Sheet2").Range("A1").Resize(c, 7)
  .Value = nRay
  .Columns.AutoFit
  .Borders.Weight = 2
[COLOR="Navy"]End[/COLOR] With
MsgBox "Code Run Complete!!"
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0

Forum statistics

Threads
1,223,907
Messages
6,175,301
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top