You're an Expert? Check this out! (Weighted Distribution in Groups)

Hawajiko · Jul 22, 2022

Hello Everyone,

Sorry for the clickbait.

I have combed the internet to find a solution for this but I was unable to come up with something that addressing my requirements. They are as follows.

1. There is a list of users which can be even or odd in number (preferable in a separate sheet since the list grows and decreases on a daily basis based on attendance so changes would automatically be catered in distribution if rows are deleted)
2. Each user has a certain weightage value associated to it (1 to 10 or decimal value like 2.1 and 2.4).
3. At the time of running the users in the list will be distributed in two groups.
4. Each group should have as close to equal weightage distribution as possible.
5. All the user in the list should be accounted for in the group irrespective if one group has an additional user, the total weightage should be close to equal.

I have been able to find some websites that can do that like the following however, their weightage distribution is from 1 to 5 which is not enough to assess skill value since someone who is a 2.1 score would be equal to a 2.4 score rounded off.

Keamk - Create random and balanced teams

Looking for assistance regarding this. I also tried using the following excel file create and shared by "Nothing Left to Lose" name "Assign_Teams" from Microsoft forum (Redirecting) but it has some issues like if an equal/exact is not possible, it will not work, it will also create groups of equal number of members not one additional in one or the other team if the total count is an odd number.

Any help in this would be greatly appreciated.

Thanks,
Hawajiko

Hawajiko · Jul 24, 2022

My apologies for the delay. You did make it clear that I was supposed to post cross-site posting of a same situation.
I had to step out and I meant to update the post here with the new link.

Thanks for being kind and posting in here.
Just not thinking straight on this I guess. Please accept my deepest regrets. Moving on, I assure you, rules would be adhered in their entirety.

KRice · Jul 24, 2022

Almost two months I offered some help to solve a problem that involved finding a subset of a group of given numbers whose sum was closest to a target value. This type of problem falls within the field of combinatorial mathematics. In the earlier problem, there were no constraints on the size of the subset (i.e., the number of values). Your problem is similar except there is a constraint on the size of the subset because you want to form two groups from all attendees and each group should, to the extent possible, have the same number of attendees.

Often there is no direct way to obtain a solution, short of a brute-force approach to examine all relevant combinations. This can be done, but consider what would be required: you will want to compute the sum of weights for half of the number of attendees. For a case where there are 20 attendees and we want to examine all possible combinations where we take 10 attendees at a time to form a group, there are 184,756 combinations (=COMBIN(20,10)...unique combinations, regardless of the order selected). The scale of your team-assignment problem poses some challenges: consider that increasing the attendees to 30 taken 15 at a time involves 155,117,520 combinations, an increase of nearly three orders of magnitude.

While working on the earlier problem, I was intrigued by some approaches developed by Tony Dallimore in 2014. The link to his work and my commentary can be found here and throughout this thread:

Choose cells that come closest to adding up to a certain total

This is probably the strangest query I've put before you, and I think it's probably a dud. But, I'm hoping that someone can prove me wrong. I am in charge of our daily lunch here at work, and break that lunch down on my credit card according to how many people from our two buildings (A&D) were...

www.mrexcel.com

I haven't dug into Tony's code to understand how he improved on efficiency, but what he describes as his "3rd approach" seemed to offer significant advantages in the earlier problem. Later in the referenced thread, I created a solution that leaned heavily on his code. That same thread includes a link back to Tony's original post and code listing.

For your problem I've attempted a solution using that same code. If you're not familiar with VBA, you would right click on the sheet tab and select View Code. Then insert a "module" in the workbook/worksheet when the VBA editor opens up. Then paste the VBA code into the empty module window and exit by selecting File > Close and Return to Excel. The biggest issue with the code is that it is being used to do something that it was not quite designed to do. The code investigates all combinations that come close to producing a target sum, regardless of the number of values involved. This means that the output table of results will contain results that are of no interest to you because they will contain too many or too few values (weighting scores for attendees who would belong to the sampled group).

Therefore, some post-processing of the results is necessary. The code is set up to associate values with assigned letter-based KeyCodes, found in A2:B2 and down on sheet [Source]. This where the weighting scores for the users/attendees would go, and some scheme is needed to assign them a letter-based KeyCode (more on this in a moment). Then the target value for the sum of weighted values for all attendees in a group is placed in cell C2 or sheet [Source]. Then run the code (go to the Developer menu item, choose Macros from the Code group, and select the code named "Control3"). The results are overwritten onto the sheet named [Result]. There you will find potentially hundreds of combinations of values (and their associated KeyCodes) that produce sums close to the target value. The sums shown in these results seem to use display-rounding, but there is a decimal component to the number if you reformat the output column.

The code is set to output up to 400 combinations. I set the output rows to such a large number since many results will be ignored because they do not contain the correct number of values. It is my hope--but I'm not aware of any guarantees--that this approach will find the group whose sum is the closest. Imagine that such a combination would be listed as the 500th output and we are looking at the first 400...we could conceivably miss something important. It is fair to say that what is found is one of the closest combinations, if not the closest combination. Someone may be able to dig into the code and make some modifications to address some issues: fixing the display rounding, allowing for specifying the number of values to be considered in the groups, and allowing for a tolerance to be set as a way of expanding or shrinking the number of outputs would be the most obvious improvements.

In any case, I obtained encouraging results; and to analyze the code output, I focused on adding some formulas to perform post-processing. First, let's summarize the input side of the [Source] sheet. Your initial list of users and their weighting scores are shown in F5:G5 and down. In H5 and down, you would enter an "x" for any users who are attending an event. Formulas above these columns compute the count of users, the sum of weighting points across all users, the sum of weighting points across all attendees, and the count of the attendees. We use these last two to determine: 1) the target sum (we take one half of the sum of weights of the attendees)...which is inserted by formula into C2 for the code to use; and 2) the number of attendees in one group (=INT(N/2)), where N is the number of attendees. To the left of your input table are fixed KeyCodes (E5 and down)...these should remain as is.

After marking users in attendance (H5 and down), formulas will construct a consolidated list of attendees and their associated KeyCodes and Weighting Scores (I5:K5 and down). Select these results in J2:K2 and down (to include only the visible outputs) and perform a Paste>Paste Special>Values in A2:B2 and down for the code to use (I haven't investigated whether this can be done with formulas). Then run the code as described above. You should see the helper cells in O:Q update to show the sums of groups on the [Result] sheet that contain the correct number of values, and their corresponding sums and row indexes where they can be found. Additionally, another logic test is performed to determine if more than one group has exactly the same "optimal" sum. A formula in O2 determines the row index of the group associated with the sum that is closest to the target value. That particular group's set of KeyCodes is copied by formula into cell L1 and the cells in L2:M2 and down are automatically populated with users assigned to both groups, consistent with the best group of those identified.

It may be difficult to set this up from an XL2BB mini-sheet, so I'm making the file available for download here:

Dropbox

www.dropbox.com

MrExcel_20220723_CombinatoricsSum_Hawajiko.xlsm

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

1

Key

Value

Target

User Count

Sum of Scores

104.7894737

<-- Sum of wts of attendees

AE+AF+AI+AJ+AL+AM+AZ+BA+BB

<-- KeyCodes "optimum" group

70

<-- no. results

2

AA

1.00

<-- Number of attendees

<-- Sum Wts

<-- index of optimum

3

AB

2.00

Maintain users/wts

If yes, mark "X"

Paste values to A:B

9

10

<-- Count

Sum Wts for results with n=9

4

AD

3.00

KeyCodes

Users

Score

In Attendance

Attendees

KeyCode

Weight

Group 1

Group 2

Group total wt.

Index

"optimal"

5

AE

3.00

AA

User 1

1.000

x

6

AF

3.00

AB

User 2

2.000

x

7

AI

4.00

AC

User 3

2.000

x

8

AJ

4.00

AD

User 4

3.000

x

9

AK

4.74

AE

User 5

3.000

x

10

AL

5.00

AF

User 6

3.000

x

11

AM

5.00

AG

User 7

4.000

12

AT

7.00

AH

User 8

4.000

13

AX

8.00

AI

User 9

4.000

x

14

AY

8.11

AJ

User 10

4.000

x

15

AZ

9.37

AK

User 11

4.737

x

16

BA

9.37

AL

User 12

5.000

x

17

BB

9.68

AM

User 13

5.000

x

18

BC

9.78947

AN

User 14

5.000

19

AO

User 15

5.000

20

AP

User 16

5.474

21

AQ

User 17

6.000

22

AR

User 18

6.737

x

23

AS

User 19

7.000

Source

Cell Formulas
Range	Formula
H1	H1	=SUMIF(Table13[In Attendance],"x",Table13[Score])
L1	L1	=INDEX(INDIRECT("Result!$c$2:$c$401"),O2)
C2	C2	=H1/2
F2	F2	=COUNTA(Table13[Users])
G2	G2	=SUM(Table13[Score])
H2	H2	=COUNTIF(Table13[In Attendance],"x")
L2:M2	L2	=SUMPRODUCT($K$5:$K$30,--(L5:L30<>""))
L3:M3	L3	=SUMPRODUCT(--(L5:L30<>""))
Q2	Q2	=SMALL(ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),1)
O1	O1	=COUNT(O5:O100)
O2	O2	=INDEX(P5:P100,MATCH(SMALL(ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),1),ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),0))
O3	O3	="Sum Wts for results with n="&INT(H2/2)
I5:I23	I5	=IFERROR(INDEX($F$5:$F$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
J5:J23	J5	=IFERROR(INDEX($E$5:$E$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
K5:K23	K5	=IFERROR(INDEX($G$5:$G$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
L5:L23	L5	=IF(I5="","",IF(ISNUMBER(SEARCH(J5,$L$1)),I5,""))
M5:M23	M5	=IF(L5="",I5,"")
O5:O23	O5	=INDEX(INDIRECT("Result!$a$2:$a$401"),AGGREGATE(15,6,ROW($1:$400)/--((LEN(INDIRECT("Result!$C$2:$C$401"))-LEN(SUBSTITUTE(INDIRECT("Result!$C$2:$C$401"),"+","")) )=INT($H$2/2)-1),ROWS(O$5:O5)))
P5:P23	P5	=AGGREGATE(15,6,ROW($1:$400)/--((LEN(INDIRECT("Result!$C$2:$C$401"))-LEN(SUBSTITUTE(INDIRECT("Result!$C$2:$C$401"),"+","")) )=INT($H$2/2)-1),ROWS(O$5:O5))
Q5:Q23	Q5	=ABS(O5-$C$2)=$Q$2

Hawajiko · Jul 24, 2022

Holy...

That's some serious effort you put in this. First of all, I would like to thank you profusely for your time.

I was testing it out and I see some issues.

1. Changing the count breaks the result where the total distribution is not close to/equal.
2. With 20 or above count, it only populates group 2
3. The end result for other combination of users is providing a group population of 8 in Group 1 and 10 in Group 2. There can only be a single extra in any group.

How can one randomize this with different combination coming up with a click/refresh? Can you please try to see the file I mentioned in the link? It's a file named Assign_Teams by "Nothing Left to Lose". I am not sure if his VB code is available to view but what he does is something close to what I want however his logic breaks if the groups could not be divided into 'exact' equal score on each group and it cannot handle odd number of members. He does give an option to click a button and randomize the grouping on and on showing different combinations each time.

If I try to understand the logic behind the combination strategy how would the distribution work? Would it so,

1. Sum the total score of present users
2. Dividing the score by 2 to create the value that it needs to be close to.
3. Generate a random number and associate it with each present user
4. Create two groups filled with random people
5. Keep trying till a close match is found?

Hawajiko · Jul 25, 2022

My query has been answered on the Chandoo forum linked in the above post.

Thanks to everyone who looked at this and attempted to solve this.

Special thanks to Krice for his tremendous effort.

KRice · Jul 25, 2022

I'm confused by your comments:

Hawajiko said:
1. Changing the count breaks the result where the total distribution is not close to/equal.

You don't explicitly change the the counts anywhere. Your only actions on the [Source] sheet are to fill out the users/attendees block in F5:H34 (and add to the end of that table if you need more room for additional users). Formulas do all of the counting and determine whether there are an even or odd number of attendees, and hence whether the two groups to be formed are of equal size or whether their sizes differ by one.

Hawajiko said:
2. With 20 or above count, it only populates group 2

After adding/deleting x's to the "In Attendance" columns to reflect a new scenario, the worksheet will not automatically update. You still need to follow the instructions in J3:K3 and copy the new KeyCode/Weight table (that reflects the revised attendee list) and paste it into A2:B2 and down as Values only. This can be done by copying the range that covers the updated KeyCode/Weight table, selecting cell A2, and then Paste>Paste Special>Values...and then delete any leftover content below the freshly pasted information (which would be necessary if a shorter block of cells were pasted over a previous longer block). And then the Control3 macro needs to be re-run. Only then, if any relevant solutions are found will the helper columns and the Group1/2 columns populate with meaningful results.

Hawajiko said:
3. The end result for other combination of users is providing a group population of 8 in Group 1 and 10 in Group 2.

This isn't possible if the sheet is used as I described above. If one or more solutions are found with the correct number of values within the 400 rows returned to the [Result] sheet, the groups will be balanced in number to within 1, depending on whether there is an even or odd number of attendees.

About your description of the general approach here, I wouldn't say that the groups are filled with random people and that various combinations are generated randomly.
1. Sum the total score of present users.
2. Divide the total score of present users by 2 to create the target sum value.
3. Count the number of present users.
4. Divide the number of present users by 2 to determine the size of Group 1 and Group 2.
5. Assign each present user an easier to reference key code, not a random number, but a unique letter-based key code that serves as an abbreviated surrogate for their name to facilitate subsequent processing. The assignment of key codes to names is arbitrary.
6. For the subsequent processing, use an algorithm to systematically step through combinations of users (keycodes) and associated scores to determine which ones produce a sum "close" to the target determined in step 2. The algorithm used could vary. The one I used was developed by Tony Dallimore about 8 years ago for a different purpose and it delivers results to sheet [Result]. This class of problem is known as a "subset sum" problem, where one wishes to determine which subset of numbers in array "a" produce a sum "s" (or near a sum "s"). That's what the code does. Your problem imposes an additional constraint and could be described as a "subset sum of k elements" problem where each subset of values considered must consist of "k" elements (i.e., INT(n/2), where "n" is the number of attendees). In the results table, you see indications that the code was designed for the first description (subset sum) rather than the second description with the additional constraint, as most of the results involve subsets consisting of more or fewer values than required. The length of the results table is controlled by the RowRsltArrMax variable. The code initially populates the results table with the first RowRsltArrMax (whatever number that is set to) results, regardless of whether they are very close to the target sum. Then for each subsequent subset of numbers, their sum is compared to the worst solution in the results table. If the new subset is an improvement, it becomes part of the results table and the worst solution is pushed off the table. If the new subset is not an improvement over the worst solution in the table, the new subset is discarded. Then the next subset of values is considered, and so on. The algorithm followed is a variant of a Branch & Bound method that attempts to improve efficiency by avoiding further exploration down certain branches when logic dictates that considering certain larger or smaller values will not offer any improvement. I had set RowRsltArrMax=400, but for a larger number of attendees, it is conceivable that all 400 results returned will not include any that have the correct number of values required to form a group because many other shorter combinations may be encountered first that are deemed "close enough" before the algorithm ever begins adding in more values. In that case, you won't get a meaningful answer on the [Source] sheet. I set up a scenario with 24 attendees and edited the code to change RowRsltArrMax=6000 and edited the helper column formulas to look at those 6000 rows. This change comes at a cost, as the spreadsheet is noticeably slower, although that may be tolerable if you have time to wait (perhaps 1-3 minutes for the code to run, and considerably longer if RowsRsltArrMax is set to even higher values). For the sample problem with n=24, among the 6000 results returned by the code, only 150 involved 12 values. The target sum was 74.553. Of those 150 results with 12 values, their sums ranged from 74.316 to 76.316, and the best sum differed from the target by only 0.026...and that was obtained by 3 combinations of 12 attendees. Based on the requirement given in your first post...

Hawajiko said:
Each group should have as close to equal weightage distribution as possible.

...this is what I assumed you wanted--an approach that gives priority to obtaining the target sum.
But your question in post #13...

Hawajiko said:
How can one randomize this with different combination coming up with a click/refresh?

...suggests that you do not have a firm requirement for an optimized solution aimed at achieving sums equal to each other. You apparently have something else in mind involving distributing the attendees differently.

Since beginning this response, I see that you found a satisfactory solution elsewhere. Thank you for following up with that update, and I'm glad you have a solution that suits your purposes.

If you are interested in investigating this approach further, my latest file is found at the link below. After running the code, the user still needs to confirm that the helper columns O:S are populated. If they are not, then RowsRsltArrMax needs to be set even larger and three formulas need to be updated to correctly reference the results on the [Result] sheet. Then the user needs to select among any of potentially several equivalent solutions by using the dropdown selector in cell R2. That pulls a particular result from the [Result] sheet and shows what the two groups would look like in columns L:M.

MrExcel_20220723_CombinatoricsSum_Hawajiko_v2.xlsm

Shared with Dropbox

www.dropbox.com

You're an Expert? Check this out! (Weighted Distribution in Groups)

Hawajiko

New Member

Hawajiko

New Member

KRice

Well-known Member

Choose cells that come closest to adding up to a certain total

Dropbox

Hawajiko

New Member

Hawajiko

New Member

KRice

Well-known Member

MrExcel_20220723_CombinatoricsSum_Hawajiko_v2.xlsm

Similar threads

Share this page

You're an Expert? Check this out! (Weighted Distribution in Groups)

New Member

New Member

Well-known Member

New Member

New Member

Well-known Member

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock