grouping related records based on two variables

ddm

New Member
Joined
Jun 10, 2005
Messages
11
Multiple aggregations for Customers (example) 3.xls
ABCDEFGH
1Sorted by Name, PhoneIF(B4=B3,E3,IF(VLOOKUP(C4,$C$15:$E$23,3,FALSE)=0,max($E$3:$E$11)+1,VLOOKUP(C4,$C$15:$E$23,3,FALSE)))
2NameName AggPhone AggDesired Agg (related by Name and Phone)agg tryPhoneMethod
3abc co131144444444441. Under Tools>Options>Calculation, select Maximum iterations = 2
4abc company221133333333332. Sort first by Name agg then Phone agg
5abc company231144444444443. Create New agg field; populate first entry as 1
6def company312322222222224. Copy list beneath old one or to new worksheet
7fgi inc (same as abc)431144444444445. In new list set formula for New agg to link to the same reference in the old list (may need Sheet ref)
8fgi inc 2531144444444446. Sort new list by Phone agg then Name agg (make sure New agg sorts correctly)
9fgi inc 2541155555555557. Copy down given formula to remaining cells
10fgi inc 3631144444444448. Repeat as often as needed for aggregations
11vv ltd75349999999999
12* Note: Order of the above steps is important!
13Sorted by Phone, Name
14NameName AggPhone AggDesired Agg (related by Name and Phone)agg tryPhone
15def company31232222222222
16abc company22113333333333
17abc co13114444444444
18abc company23114444444444
19fgi inc (same as abc)43114444444444
20fgi inc 253114444444444
21fgi inc 363114444444444
22fgi inc 254115555555555
23vv ltd75349999999999
Sheet1



Hello,
This is my first post. I'm trying to create a procedure to clean up a large database with multiple related or duplicate clients. Related clients are defined (for example) as clients having either the same name or telephone number. I want to create an aggregation field with a unique number for each set of related clients.

In the file below I've started by creating a one-variable grouping for both variables by sorting and using an if formula to compare to the previous record. My attempt at consolidating the two aggregations is outlined in the file.

Note that this file uses multiple circular references with calculation interations set at max of 2. This is the first time I'm trying this, I imagine ideally loops might be better but I don't know visual basic.

{As a side note, I'm not sure why my Agg Try field has gaps in the consecutive numbering of the groups, ie 1,3,4 instead of 1,2,3. This didn't happen till I had to format to fit the html limit. Not a big deal, just hope it's not a sign of an unreliable procedure.)

Is the logic of this process reasonable? It seems to work on this small dataset, but I'm not sure if I've covered off all scenarios of sortations and groupings of client records, and if it would be efficient on a large file of 50,000 records. I've been staring at this for too long and need a fresh Any help or feedback would be appreciated. Thanks for your help!!!

Dee
 

Excel Facts

Control Word Wrap
Press Alt+Enter to move to a new row in a cell. Lets you control where the words wrap.
Hello DDM..

I'm trying to create a procedure to clean up a large database with multiple related or duplicate clients.

Are all you really trying to do is get rid of duplicates by looking at the phone number of company name ? If so then it can be done quickly with Excels advanced filter feature.

FOR EXAMPLE:
Lets say your...
- Comp Names are in Column A
- Ph # are in column B
- There is a header in A1 and B1
- Data starts in row 2

... then

1. Copy this formula into cell $D$2
=AND(COUNTIF($A$2:A2,A2)=1,COUNTIF(A$2:$B2,B2)=1)

2. Select all of column A and B

3. goto Tools... Data... Filter .. Advanced Filter

4. In advanced filter window enter the following
ACTION: select .. "Copy to another locaction"
LIST RANGE: $A:$B
CRITERIA RANGE: $D1:$D2
COPY TO: $F$1

5. CLICK OK


PS Here's some good info on advanced filters : http://www.contextures.com/xladvfilter01.html
 
Upvote 0
Thanks Nimrod. I tried the filters as you suggested to the file and came up with the result below, which I don't understand the logic of.

Name Phone
abc company 3333333333
abc company 4444444444
fgi inc (same as abc) 4444444444

Actually, I don't think a filter is what I'm looking for. I'm not actually wanting to delete any records from the list, just to group them so that abc co, abc company, fgi inc (really abc), fgi 2 and fgi 3 are all grouped as related businesses (because either the phone or company name was the same), designated by one number (ie 1), and def company and vv ltd would be presumably 2 and 3 respectively. And that will work for a much more complex dataset. Thanks.
 
Upvote 0
grouping records based on two variables - still need help!!

I still need help on this if possible...my previous procedure seems to work usually but I couldn't get it to work on the real file. Actually even duplicating the results of my sample file above using the same method does not necessarily appear to be be consistent. Is there an error in my logic? Or maybe circular formulas just weren't meant to be used like this, just not sure how to get around it... :-(

Anyway, here is the problem again, more concisely...Given just the Name and Phone fields below, how to determine the Related Groupings field (shows whether the companies are related either by Name *or* by Phone? (dots are spaces)

Name....Phone .Related Groupings
abc ...........33 .................1
abx ...........44 .................1
abc ...........44.................1
fg1 ...........44.................1
fg2 ............44.................1
fg3 .............44.................1
fg2 .............55.................1
def ............22.................2
vvv .............99.................3

Is there a workable solution to this problem? I'm not familiar with VB, but if that is the only way to solve this problem I want to learn. It would be nice to know if someone is trying it even if it takes a while to figure out. Thanks for any help!!
 
Upvote 0

Forum statistics

Threads
1,223,639
Messages
6,173,498
Members
452,516
Latest member
druck21

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top