Finding potential duplicates

rexhvn

New Member
Joined
Jan 7, 2016
Messages
27
Hi all,

Hope you're well.

I am hoping for some assistance. I am trying to identify if there are any potential duplicate references in my spreadsheet and the system doesn't allow the same reference to be entered twice. So if there is a duplicate, it will somehow be slightly different. IE: 0001-1 or 0001/1 etc.

I have all references in Column A and want to know if there is a formula or a way to identify if there is a potential duplicate and reference it if possible.

Your help is appreciated.

Thanks,
 

Excel Facts

Difference between two dates
Secret function! Use =DATEDIF(A2,B2,"Y")&" years"&=DATEDIF(A2,B2,"YM")&" months"&=DATEDIF(A2,B2,"MD")&" days"
Hi all,

Hope you're well.

I am hoping for some assistance. I am trying to identify if there are any potential duplicate references in my spreadsheet and the system doesn't allow the same reference to be entered twice. So if there is a duplicate, it will somehow be slightly different. IE: 0001-1 or 0001/1 etc.

I have all references in Column A and want to know if there is a formula or a way to identify if there is a potential duplicate and reference it if possible.

Your help is appreciated.

Thanks,
Questions:
Is there a standard format for those references?
Are they supposedly auto-generated by the system or entered manually?
Assumedly all have the same structure but should vary in the last 1 or 2 characters?
How long is the Col A list?
 
Upvote 0
Of course, sorry. Please see responses below
Is there a standard format for those references? Unfortunately not. There is no format or consistency, however if there is a duplicate, there shouldn't be too much of a different.
Are they supposedly auto-generated by the system or entered manually? All are manually entered. The system that allows the data to be entered does not allow duplicates.
Assumedly all have the same structure but should vary in the last 1 or 2 characters? The potential duplicates would be 1 or 2 characters difference, however the format of references throughout are quite different. There is no consistency which makes it extra difficult.
How long is the Col A list? Approx 18,000 items.
 
Last edited:
Upvote 0
That is not what I wanted to hear but I assumed that would be the case.
I can’t think of a really efficient way to handle this.

I’m thinking to duplicate the data under another name and create another sheet called “duplicates”.
Now I would look at the first ID assumedly in A2, get a string that was 2 characters less than A2, and then scroll down Col A to find something that matched it. Having found one I’d copy the original ID and its row to “Duplicates” followed by the assumed duplicate with its ID and row. The “duplicate” would be removed from the list and the same procedure would be followed for A3, A4 …

With 18,000 rows I have no idea how long that sort would take.

If you are agreeable to that approach I’ll see what I can do, but hopefully there is someone in the meantime who can propose something more efficient.
 
Upvote 0
@rexhvn,
I've generated some random data for Col A, and done some "duplications" on some of it, ie changing one or more of the of the last three characters.
My macro as I described above is generally working but ... it has a bug. I'm not prepared to release what I have at the moment as I am about to go out, and to describe what I have actually done, and the error would take too long. Unfortunately I won't be able to get back to this until Saturday (Aussie time).

This however is the table from which I am working (Correlation indicated the row to which there is a match or at least a potential match:
[TABLE="width: 128"]
<colgroup><col style="width:48pt" width="64" span="2"> </colgroup><tbody>[TR]
[TD="width: 64"]ID[/TD]
[TD="width: 64"]Correlation[/TD]
[/TR]
[TR]
[TD]FS57[/TD]
[TD="align: right"]9[/TD]
[/TR]
[TR]
[TD]XCH59[/TD]
[TD="align: right"]11[/TD]
[/TR]
[TR]
[TD]AE79[/TD]
[TD="align: right"]12[/TD]
[/TR]
[TR]
[TD]HL07[/TD]
[TD="align: right"]13[/TD]
[/TR]
[TR]
[TD]DH98[/TD]
[TD="align: right"]14[/TD]
[/TR]
[TR]
[TD]IP47[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]RH90[/TD]
[TD="align: right"]16[/TD]
[/TR]
[TR]
[TD]HQ35[/TD]
[TD="align: right"]17[/TD]
[/TR]
[TR]
[TD]FS56[/TD]
[TD="align: right"]2[/TD]
[/TR]
[TR]
[TD]XCH59[/TD]
[TD="align: right"]3[/TD]
[/TR]
[TR]
[TD]AE80[/TD]
[TD="align: right"]4[/TD]
[/TR]
[TR]
[TD]HL77[/TD]
[TD="align: right"]5[/TD]
[/TR]
[TR]
[TD]DL98[/TD]
[TD="align: right"]6[/TD]
[/TR]
[TR]
[TD]IM47[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]RH66[/TD]
[TD="align: right"]8[/TD]
[/TR]
[TR]
[TD]HQ35[/TD]
[TD="align: right"]9[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
I got home earlier than I expected.

Upon reflection it is pointless just to identify what may be duplicates, one would have to keep manually cross-referencing. It would be logical to have at least some of the data carried across for checking.

I decided not to copy the Main sheet elsewhere and not delete from it. You are probably not anticipating that data might be in triplicate so it is time-saving not to delete (of course, if there is further replication that would be missed should my first strategy have been employed). Besides, it would probably take a similar time to seek a further replica as it would to delete a row under that strategy.

Since I have no idea of your content I’ve decided upon 3 categories, ID, Plant and Use; I figure three levels are probably enough for a comparison.

The table below shows:
First area is my Data on “Main”.

The second zone is how the macro paired the various data lemon and mango missed out because of the third last character (I count from right to left, as does the macro).

It is obvious that by virtue of the ‘Use” column some items just are not duplicates but then some with Plant and Use have different IDs (which is what you are seeking). Unlikely your system will find two identical IDs but this scenario offers just that.

The third zone is an ascending sort of the second table based on the IDs. That would be useful if you did suspect more than duplicates. You don’t need a macro for that, merely select the columns and use the Screen “Sort” option with the ID as your key.

Ok, table and macro:

Excel 2013/2016
ABCDEFGHIJKLM
1IDPlantUseRow fndIDPlantUseRow fndIDPlantUse
2FS57carrotroast2FS57carrotroast4AE79potatomashed
3XCH59pumpkinpie10FS56carrotspirals12AE80potatosalad
4AE79potatomashed3XCH59pumpkinpie14DH97plumjam
5HL07applesponge11XCH59pumpkinpie6DH98plumjam
6DH98plumjam4AE79potatomashed10FS56carrotspirals
7IP47lemonjuice12AE80potatosalad2FS57carrotroast
8RH90orangepeel5HL07applesponge5HL07applesponge
9HQ35lettuceiceberg13HL77applesponge13HL77applesponge
10FS56carrotspirals6DH98plumjam9HQ35lettuceiceberg
11XCH59pumpkinpie14DH97plumjam17HQ35lettuceiceberg
12AE80potatosalad8RH90orangepeel16RH66orangejuice
13HL77applesponge16RH66orangejuice8RH90orangepeel
14DH97plumjam9HQ35lettuceiceberg3XCH59pumpkinpie
15IM47mangoslices17HQ35lettuceiceberg11XCH59pumpkinpie
16RH66orangejuice
17HQ35lettuceicebergPairs foundIDs Ordered
Main


I trust this will run insid've tried to be as time-saving as I know how.
Code:
Sub Dupes()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Sheets("main").Select
Dim lRw As Long
Dim rw As Long
Dim dp1, dp2 As String
Dim a, b As Long
lRw = Cells(Rows.Count, "A").End(xlUp).Row
rw = 2
For a = 2 To lRw
Sheets("Main").Select
'Get first available item
dp1 = Left(Range("A" & a), Len(Range("A" & a)) - 2)
For b = a + 1 To lRw
Sheets("Main").Select
'Get next available item to test
dp2 = Left(Range("A" & b), Len(Range("A" & b)) - 2)

If dp1 = dp2 Then  ' When we have a match we transfer a copy
Range(Cells(a, 1), Cells(a, 3)).Copy

Sheets("Duplicates").Select
Cells(rw, 2).PasteSpecial
Cells(rw, 1).Value = a
Sheets("Main").Select
Range(Cells(b, 1), Cells(b, 3)).Copy

Sheets("Duplicates").Select
Cells(rw + 1, 2).PasteSpecial
Cells(rw + 1, 1).Value = b
rw = rw + 2

Exit For  'Get out, job done, saves time

End If

Next
Next
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub

I hope this does as you need.
 
Upvote 0
I'm checking back. You do not seem to have been at this Forum since my last reply. Have you since found a solution other than potentially offered, or just not gotten back?
 
Upvote 0
Hi !
@BrianJN1 I am running into a similar situation with an inventory list that has many duplicates. I have downloaded your example above and if by chance you are still listening let me know as I have a couple questions.
Thanks !
 
Upvote 0
if by chance you are still listening let me know as I have a couple questions.
Welcome to the MrExcel board!

There are many potential helpers in the forum so I suggest that you start a new thread of your own, give a small set of dummy sample data & the expected results (my signature block below has help with how to do that) and explain in words in relation to that sample data what you need. If you think it will help, you can provide a link back to this thread in your new one.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,170
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top