How to find and remove duplicate data in entire workbook?

Aneeqe

New Member
Joined
Sep 5, 2012
Messages
6
Hello

My workbook consists of 12 worksheets containing thousands of complaints data.

Each worksheet has been segregated by nature of complaint; and

Each complaint has a unique id listed in a separate column.


Problem:


One single complaint is appearing in more than one worksheet and there could be thousands of such incidents. E,g; Complaint no. 01 may appear in sheet 01, 05 and 12 instead of just one relevant sheet, say sheet 05.


Requirement:

I need to find the duplicate entries in the entire workbook and then classify each individual sheet in the most appropriate sheet.

The routine options known to me only allow to find duplicate entries in one single worksheet instead of entire workbook.

Thanks
 

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
Are you looking for a purely automated approach or just any method of reasonably easy but manual steps that will work?

Finding the duplicates isn't too hard, but the snag on fully automating this is how do you know which sheet is most appropriate when duplicates are found? Is there a programmable logic that can be used or are you simply reading a complaint text field and making a judgement call?

Finding the duplicates:
- The most direct approach is to simply merge your data on a new worksheet then create a pivot table.
- Copy/Paste x12, assuming the fields are identical - though you really just need the Complaint ID field and ideally a complaint category field that will identify which worksheet the complaint came from.
- Create a pivot table on this data. Even a simple pivot counting IDs can be used to identify the duplicates. If you brought in a category field you can show which sheets the duplicates fall under.
-I recommend ID and Category both in the rows section (ID on top), with a count of IDs in the data section. Remove subtotals for ID, and set the pivot options to Classic view. Sort data Descending. Optionally hide counts = 1 to show just duplicate entries. This particular setup would allow you to flag duplicates you don't want easily, by simply entering a "x" or 1 in adjacent cells, such that another formula can be used to flag them on the other 12 worksheets.
 
Last edited:
Upvote 0
Hello

Thanks a lot.

I would like to have a fully automated procedure but I believe it would be beyond my expertise. Therefore; its better to stick to a simpler solution, It may take time but must give accurate result.

As u asked; actually our complaint system has a classification mechanism but user do not use either the correct main classification or its most relevant sub classification which distorts our entire analysis. Therefore; we have to work it out manually and by reading each individual complaint.

Since there are thousands of complaints and many of them contain common feature; therefore; it happens that one complaint is classified in more than one category.

I hope I have clarified ur query. Now can u suggest me any other more simple step by step manual procedure or the above one would be the most suitable one.

Thanks; Will wait for ur response.
 
Upvote 0
As far as identifying duplicates, we could also use a formula approach using a series of Countifs - but the fact that you have 12 sheets makes these formulas long without resorting to a UDF. For example this formula will at least give you a count of how many times an ID shows up on all 12 data sheets:

Code:
=COUNTIF('Category 1'!$A$2:$A$200,A2)+COUNTIF('Category 2'!$A$2:$A$200,A2)+COUNTIF('Category 3'!$A$2:$A$200,A2)+COUNTIF('Category 4'!$A$2:$A$200,A2)+COUNTIF('Category 5!$A$2:$A$200,A2)+COUNTIF('Category 6'!$A$2:$A$200,A2)+COUNTIF('Category 7'!$A$2:$A$200,A2)+COUNTIF('Category 8'!$A$2:$A$200,A2)+COUNTIF('Category 9'!$A$2:$A$200,A2)+COUNTIF('Category 10'!$A$2:$A$200,A2)+COUNTIF('Category 11!$A$2:$A$200,A2)+COUNTIF('Category 12'!$A$2:$A$200,A2)

Another formula approach is to display a letter coding that will point the sheet locations of each ID. For example:

Code:
=IF(COUNTIF('Category 1'!$A$2:$A$200,A2)>0,"a","")&IF(COUNTIF('Category 2'!$A$2:$A$200,A2)>0,"b","")&IF(COUNTIF('Category 3'!$A$2:$A$200,A2)>0,"c","")&IF(COUNTIF('Category 4'!$A$2:$A$200,A2)>0,"d","")&IF(COUNTIF('Category 5'!$A$2:$A$200,A2)>0,"e","")&IF(COUNTIF('Category 6'!$A$2:$A$200,A2)>0,"f","")&IF(COUNTIF('Category 7'!$A$2:$A$200,A2)>0,"g","")&IF(COUNTIF('Category 8'!$A$2:$A$200,A2)>0,"h","")&IF(COUNTIF('Category 9'!$A$2:$A$200,A2)>0,"i","")&IF(COUNTIF('Category 10'!$A$2:$A$200,A2)>0,"j","")&IF(COUNTIF('Category 12'!$A$2:$A$200,A2)>0,"k","")&IF(COUNTIF('Category 12'!$A$2:$A$200,A2)>0,"L","")

Copied down an adjacent column on each of the 12 sheets, this would be another way to map out the duplicates. Duplicate entries would look like: "ab f" for example, while IDs with no duplicates would only show a single letter.

Now what I like about the original approach (merging the data then creating a pivot table), is that it's fairly easy to reduce the data from all 12 sheets to a single view of all the duplicates. Presumably, once you hit this point you'll have to read these complaints to determine where they should fall. The next step is to flag the repeat entries you want to remove and then strip them. The advantage of the pivot approach, is you now have a single table where you can easily flag the ones you want to strip out or keep, giving you an easy range for a formula to find.

eg: =COUNTIFS('Pivot Tablesheet'!$A$2:$A$200,A2,'Pivot Tablesheet'!$B$2:$B$200,"Category 1",'Pivot Tablesheet'!$D$2:$D$200,"x")

The rather lang formulas early in this post puts you in a position of manually having to locate the repeat IDs and deleting them.
 
Upvote 0
Thanks a lot.

I m really grateful for the time u have spared for my problem. Will let u know which soultion proved to be more useful for me.

Take Care.
 
Upvote 0

Forum statistics

Threads
1,225,726
Messages
6,186,666
Members
453,368
Latest member
xxtanka

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top