VBA Code to remove partial duplicate entries from data based on set criteria

fishep6 · May 22, 2017

[TABLE="width: 855"]
<tbody>[TR]
[TD="colspan: 8"]Hi

What I am trying to achieve is a code that will remove semi duplicate rows of data from the attached spread sheet table, (Table 1) and then leave me the results as in (Table 2)
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 5"]I am not sure whether I need to concatenate as part of the formula or not in order to get the objective to work.

ORIGINAL DATA TABLE 1
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]First Name
[/TD]
[TD]Surname
[/TD]
[TD]Date Of Birth
[/TD]
[TD]Date Of Assessment
[/TD]
[TD]Gender
[/TD]
[TD]Smoker
[/TD]
[TD]Health
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Dave
[/TD]
[TD]Smith
[/TD]
[TD]02/06/1977
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]17-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rose
[/TD]
[TD]Frank
[/TD]
[TD]12/05/1981
[/TD]
[TD]14-May
[/TD]
[TD]Female
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Ben
[/TD]
[TD]Jakes
[/TD]
[TD]06/07/1988
[/TD]
[TD]16-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Bad
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Josh
[/TD]
[TD]West
[/TD]
[TD]12/02/1965
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]22-May
[/TD]
[TD]Male
[/TD]
[TD]Unknown
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Shariq
[/TD]
[TD]Najeeb
[/TD]
[TD]04/11/1976
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]Columns (A,B and C) are the columns I want to check for duplicates, theColumn (F) will be the only real variable I am concerned with.
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]So in the example we can see that "Adam" and "Rita" each have a duplicate entry, however there they have differing values in column F which is what I am concerned with and they also have a differing value in column D which I am not concerned about.
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]What I want to do effectively, when a duplicate is listed is keep the first line that is entered and remove the duplicate line. However I also want the line that is remaining to have the following changes to the Smoker column.
[/TD]
[/TR]
[TR]
[TD="colspan: 8"]I want the smoker column to then state the word "ERROR", followed by the text that is in both smoker columns for the duplicate entries, so you will see in the second table this is how I want the results to appear

IDEAL OUTCOME TABLE 2
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]First Name
[/TD]
[TD]Surname
[/TD]
[TD]Date Of Birth
[/TD]
[TD]Date Of Assessment
[/TD]
[TD]Gender
[/TD]
[TD]Smoker
[/TD]
[TD]Health
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Dave
[/TD]
[TD]Smith
[/TD]
[TD]02/06/1977
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Adam
[/TD]
[TD]Jones
[/TD]
[TD]03/03/1960
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]ERROR "Yes","No"
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rose
[/TD]
[TD]Frank
[/TD]
[TD]12/05/1981
[/TD]
[TD]14-May
[/TD]
[TD]Female
[/TD]
[TD]No
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Ben
[/TD]
[TD]Jakes
[/TD]
[TD]06/07/1988
[/TD]
[TD]16-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Bad
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Josh
[/TD]
[TD]West
[/TD]
[TD]12/02/1965
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]No
[/TD]
[TD]Fair
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Rita
[/TD]
[TD]Copeman
[/TD]
[TD]25/06/1978
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]ERROR "Yes","Unknown"
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Shariq
[/TD]
[TD]Najeeb
[/TD]
[TD]04/11/1976
[/TD]
[TD]14-May
[/TD]
[TD]Male
[/TD]
[TD]Yes
[/TD]
[TD]Good
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]So you will see my original table has reduced from 9 lines to 7

You will note for "Adam" there is now only one entry, colum (D) is using the date from the first entry in the original table, and then in column F is contains the text "ERROR" followed by the 2 duplicate entries from the previous table which were "Yes" and "No"
[/TD]
[/TR]
[TR]
[TD="colspan: 8"]You will note for "Rita" there is now only one entry, colum (D) is using the date from the first entry in the original table, and then in column F is contains the text "ERROR" followed by the 2 duplicate entries from the previous table which were "No" and "Unknown"
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 8"]My data table is a lot larger than the sample but hopefully this is possible? I am happy to get the data I want in table 2 to open on a separate tab on my spreadsheet if that is an easier approach?
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="colspan: 2"]Thank you in advance
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

MickG · May 22, 2017

Try this for results on sheet2:-

Code:

[COLOR="Navy"]Sub[/COLOR] MG22May03
[COLOR="Navy"]Dim[/COLOR] Ray [COLOR="Navy"]As[/COLOR] Variant, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Txt [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Ac [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant
Ray = Cells(1).CurrentRegion.Resize(, 7)
ReDim nRay(1 To UBound(Ray, 1), 1 To UBound(Ray, 2))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] n = 1 To UBound(Ray, 1)
Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3)
    [COLOR="Navy"]If[/COLOR] Not .Exists(Txt) [COLOR="Navy"]Then[/COLOR]
      c = c + 1
      [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
         nRay(c, Ac) = Ray(n, Ac)
      [COLOR="Navy"]Next[/COLOR] Ac
        .Add Txt, Array(Ray(n, 6), c)
    [COLOR="Navy"]Else[/COLOR]
       Q = .Item(Txt)
        Q(0) = Q(0) & ", """ & Ray(n, 6) & ""
       nRay(Q(1), 6) = "Error  """ & Q(0) & """"
    .Item(Txt) = Q
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]With[/COLOR] Sheets("Sheet2").Range("A1").Resize(c, 7)
  .Value = nRay
  .Columns.AutoFit
  .Borders.Weight = 2
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]With[/COLOR]
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]

Regards Mick

fishep6 · May 22, 2017

Thank you that is amazing

I have a couple of tweaks I need to do if possible please?

1) Rather than columns A,B and C being the duplicates it is actually columns B,C and D, presumably I just change Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3) to Txt = Ray(n, 2) & Ray(n, 3) & Ray(n, 4)???

2) Is there a way to get the formula to ignore this if the value in column 6 (F) is "NULL"???

3) What would I need toad to the end of the code in order to get a pop up box saying that the code has run and is completed?

Many thanks

MickG · May 22, 2017

Q(1) Basically Yes , but maybe a bit more required, I'll send some new code !!!
Q(2) Do you mean, Do NOT include this line in sheet2 results if column(6) has the actual string "Null" in it. ???
Q(3) No problem with this !!!!

fishep6 · May 22, 2017

1) brilliant
2) - no if it has Null in the F column then the code just ignores it and takes the line over to the second sheet without trying to de duplicate it.
3) Thank you so much

MickG · May 22, 2017

Try this for data starting column "B1" and results in sheet2 starting "A1".

Code:

[COLOR="Navy"]Sub[/COLOR] MG22May23
[COLOR="Navy"]Dim[/COLOR] Ray [COLOR="Navy"]As[/COLOR] Variant, n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Txt [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]String,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Ac [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] Q [COLOR="Navy"]As[/COLOR] Variant
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range
Ray = Cells(1).CurrentRegion.Offset(, 1).Resize(, 8)
ReDim nRay(1 To UBound(Ray, 1), 1 To UBound(Ray, 2))
[COLOR="Navy"]With[/COLOR] CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] n = 1 To UBound(Ray, 1)
Txt = Ray(n, 1) & Ray(n, 2) & Ray(n, 3)
    [COLOR="Navy"]If[/COLOR] Not .Exists(Txt) [COLOR="Navy"]Then[/COLOR]
      c = c + 1
      [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
         nRay(c, Ac) = Ray(n, Ac)
      [COLOR="Navy"]Next[/COLOR] Ac
        .Add Txt, Array(Ray(n, 6), c)
    [COLOR="Navy"]Else[/COLOR]
       Q = .Item(Txt)
        [COLOR="Navy"]If[/COLOR] Q(0) = "Null" [COLOR="Navy"]Then[/COLOR]
            c = c + 1
            [COLOR="Navy"]For[/COLOR] Ac = 1 To UBound(Ray, 2)
                nRay(c, Ac) = Ray(n, Ac)
            [COLOR="Navy"]Next[/COLOR] Ac
        [COLOR="Navy"]Else[/COLOR]
            Q(0) = Q(0) & ", """ & Ray(n, 6) & ""
            nRay(Q(1), 6) = "Error  """ & Q(0) & """"
        [COLOR="Navy"]End[/COLOR] If
    .Item(Txt) = Q
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]End[/COLOR] With
[COLOR="Navy"]With[/COLOR] Sheets("Sheet2").Range("A1").Resize(c, 7)
  .Value = nRay
  .Columns.AutoFit
  .Borders.Weight = 2
[COLOR="Navy"]End[/COLOR] With
MsgBox "Code Run Complete!!"
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]

Regards Mick

VBA Code to remove partial duplicate entries from data based on set criteria

fishep6

New Member

Excel Facts

MickG

MrExcel MVP

fishep6

New Member

MickG

MrExcel MVP

fishep6

New Member

MickG

MrExcel MVP

Similar threads

Forum statistics

Share this page

VBA Code to remove partial duplicate entries from data based on set criteria

fishep6

New Member

Excel Facts

MickG

MrExcel MVP

fishep6

New Member

MickG

MrExcel MVP

fishep6

New Member

MickG

MrExcel MVP

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock