Compare 4 columns to find duplicate rows and copy one column data from all duplicate rows

pacchiee · Sep 1, 2016

[TABLE="class: grid, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]First Name[/TD]
[TD]Last Name[/TD]
[TD]State[/TD]
[TD]Reference[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Mark[/TD]
[TD]Brown[/TD]
[TD]CA[/TD]
[TD]2257849635[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Shawn[/TD]
[TD]Jack[/TD]
[TD]CA[/TD]
[TD]2245787962[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Smith[/TD]
[TD]Black[/TD]
[TD]CA[/TD]
[TD]7789654123[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Mark[/TD]
[TD]Brown[/TD]
[TD]CA[/TD]
[TD]2257849635[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]Smith[/TD]
[TD]Black[/TD]
[TD]CA[/TD]
[TD]7789654123[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]Mark[/TD]
[TD]Brown[/TD]
[TD]CA[/TD]
[TD]2257849635[/TD]
[/TR]
</tbody>[/TABLE]

I have huge data (10,00,000 rows) in the above format. Need to find all the IDs having duplicate content.

For example, the rows 1, 4 and 6 has same data; similarly 3 & 5 are duplicate rows.

Wanted a macro to do the following:
1. Delete all unique rows contents except ID.
2. Find duplicate rows and paste the IDs of duplicate rows next to the Reference column. Like "4,6" for "1".
3. Delete all the contents of duplicate rows except ID.

Finally, the sheet should look like this:
[TABLE="class: grid, align: center"]
<tbody>[TR]
[TD]ID[/TD]
[TD]First Name[/TD]
[TD]Last Name[/TD]
[TD]State[/TD]
[TD]Reference[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Mark[/TD]
[TD]Brown[/TD]
[TD]CA[/TD]
[TD]2257849635[/TD]
[TD]4,6[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Smith[/TD]
[TD]Black[/TD]
[TD]CA[/TD]
[TD]7789654123[/TD]
[TD]5[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

I tried other manual methods, but due to huge data the file freezes. I know nothing about macros to write one. Please help me with this. Thank you.

BGY23 · Sep 1, 2016

Do you need to compare all 4 columns or id the reference unique to the person. If not do you have another column with a unique reference per person?

pacchiee · Sep 1, 2016

The only unique data is ID. Chances are there that all other 4 columns are similar. Hence, we need to compare all 4 columns to find duplicate rows.

BGY23 · Sep 2, 2016

I can't write macro's for you, not my thing.

Assmue your data is in A1:E7;

Insert 2 now columns at A
B2 =CONCATENATE(D2,E2,F2,G2,)
A2= =COUNTIF(B$1:B4,B4) [TABLE="******* 196"]
<tbody>[TR]
[TD]

Its importatnt to note that the B1 in A2 is anchored with $.

Copy the formula's down and filter anything that isn't a one in column A and clear the contents of cells D:G

Good luck

[/TD]
[/TR]
</tbody><colgroup><col></colgroup>[/TABLE]

BGY23 · Sep 2, 2016

A2 formula should be =countif(B$1:B2,B2)

Compare 4 columns to find duplicate rows and copy one column data from all duplicate rows

pacchiee

New Member

BGY23

Well-known Member

pacchiee

New Member

BGY23

Well-known Member

BGY23

Well-known Member

Similar threads

Share this page

Compare 4 columns to find duplicate rows and copy one column data from all duplicate rows

pacchiee

New Member

BGY23

Well-known Member

pacchiee

New Member

BGY23

Well-known Member

BGY23

Well-known Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock