Removing duplicates (both) only keeping unique data

PondWaters

New Member
Joined
Jun 26, 2012
Messages
36
Office Version
  1. 365
I was able to get some help from this forum earlier, and am hoping for the same.


I have files that come in twice a month with around 135k rows of data each, of the 270k that come in, I would say around 265 to 268k are complete duplicates. Is there a way for me to completely remove BOTH duplicates and only keep unique data? Also, I am matching on fields 2, 7 and 8

Sheet1

*ABCDEFGH
1Field1Field2Field3Field4Field5Field6Field7Field8
2123456789123456789-1SmithBob2015050120150531$12.00 A
3987654321963258741-1SmithCarol2015050120150531$14.00 A
4147258369878965412-1JonesFred20150501201505310B
53692581478795412-2JonesErica2015050120150531$8.00 C
63216549873251148-1JamesTom2015050120150531$25.00 C
735714715932584415-6JamesRobert2015050120150531$19.50 B

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:79px;"><col style="width:77px;"><col style="width:64px;"><col style="width:64px;"><col style="width:71px;"><col style="width:71px;"><col style="width:64px;"><col style="width:64px;"></colgroup><tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4

The second file later in the month would come in like this:

Sheet1

*ABCDEFGH
9Field1Field2Field3Field4Field5Field6Field7Field8
10123456789123456789-1SmithBob2015050120150531$12.00 A
11987654321963258741-1SmithCarol2015050120150531$14.00 A
12147258369878965412-1JonesFred2015050120150531$33.00 B
133692581478795412-2JonesErica2015050120150531$8.00 C
143216549873251148-1JamesTom2015050120150531$25.00 C
1535714715932584415-6JamesRobert2015050120150531$19.50 B
1632425724532584415-7BlueJames2015050120150531$23.00 A

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:79px;"><col style="width:77px;"><col style="width:64px;"><col style="width:64px;"><col style="width:71px;"><col style="width:71px;"><col style="width:64px;"><col style="width:64px;"></colgroup><tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4


I would want the following where the duplicates are completely removed:


I know how to remove just one duplicate so Access removes one of the two, but I need both lines removed. I have been searching but most everything i can find is for removing on single fields, I need fields 2, 7 and 8 to be exact matches.

Sheet1

*ABCDEFGH
19Field1Field2Field3Field4Field5Field6Field7Field8
20147258369878965412-1JonesFred20150501201505310B
21147258369878965412-1JonesFred2015050120150531$33.00 B
2232425724532584415-7BlueJames2015050120150531$23.00 A

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:79px;"><col style="width:77px;"><col style="width:64px;"><col style="width:64px;"><col style="width:71px;"><col style="width:71px;"><col style="width:64px;"><col style="width:64px;"></colgroup><tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4
 

Excel Facts

Pivot Table Drill Down
Double-click any number in a pivot table to create a new report showing all detail rows that make up that number
this may not be the most efficient way of doing it, but adding helper columns might help:

(1) I2 = B2&G2&H2 where B2 is Field2, G2 is Field7 and H2 is Field8
(2) J2 = COUNTIF($I$2:$I$14,$I2)

then filter column J for counts of 1

I was able to get some help from this forum earlier, and am hoping for the same.


I have files that come in twice a month with around 135k rows of data each, of the 270k that come in, I would say around 265 to 268k are complete duplicates. Is there a way for me to completely remove BOTH duplicates and only keep unique data? Also, I am matching on fields 2, 7 and 8

Sheet1

*ABCDEFGH
1Field1Field2Field3Field4Field5Field6Field7Field8
2123456789123456789-1SmithBob2015050120150531$12.00A
3987654321963258741-1SmithCarol2015050120150531$14.00A
4147258369878965412-1JonesFred20150501201505310B
53692581478795412-2JonesErica2015050120150531$8.00C
63216549873251148-1JamesTom2015050120150531$25.00C
735714715932584415-6JamesRobert2015050120150531$19.50B

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4

The second file later in the month would come in like this:

Sheet1

*ABCDEFGH
9Field1Field2Field3Field4Field5Field6Field7Field8
10123456789123456789-1SmithBob2015050120150531$12.00A
11987654321963258741-1SmithCarol2015050120150531$14.00A
12147258369878965412-1JonesFred2015050120150531$33.00B
133692581478795412-2JonesErica2015050120150531$8.00C
143216549873251148-1JamesTom2015050120150531$25.00C
1535714715932584415-6JamesRobert2015050120150531$19.50B
1632425724532584415-7BlueJames2015050120150531$23.00A

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4


I would want the following where the duplicates are completely removed:


I know how to remove just one duplicate so Access removes one of the two, but I need both lines removed. I have been searching but most everything i can find is for removing on single fields, I need fields 2, 7 and 8 to be exact matches.

Sheet1

*ABCDEFGH
19Field1Field2Field3Field4Field5Field6Field7Field8
20147258369878965412-1JonesFred20150501201505310B
21147258369878965412-1JonesFred2015050120150531$33.00B
2232425724532584415-7BlueJames2015050120150531$23.00A

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4
 
Upvote 0
It sounds like the solution you have is in Excel, I am trying to stay away from Excel due to the size of the files - once I put formulas in on large files Excel seems to slow down immensely, to the point of crashing. Is there a way in Access to do this?
 
Upvote 0
basically, the same logic should apply, concatenate all 3 fields to create an ID, then in another query, do a Group by of that field as well as count, filter for 1..


It sounds like the solution you have is in Excel, I am trying to stay away from Excel due to the size of the files - once I put formulas in on large files Excel seems to slow down immensely, to the point of crashing. Is there a way in Access to do this?
 
Upvote 0
I am running into another issue with this that I can;t figure out:

I am in design view:

I concatenated all of the fields and that works like a charm.
I have Expr2 as Count([Expr1])

I click totals at the top and have total as Expression. I am expecting that Expr2 will show some 2's since I have a bunch of duplicates. Everything shows as 1. What am I doing wrong? I have the same formula in another spreadsheet - the exception being the name of the document (I even checked SQL view), so I can't figure out why this is not working for me.
 
Upvote 0

Forum statistics

Threads
1,221,848
Messages
6,162,419
Members
451,765
Latest member
craigvan888

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top