You're an Expert? Check this out! (Weighted Distribution in Groups)

Hawajiko

New Member
Joined
Jul 22, 2022
Messages
8
Office Version
  1. 2019
Platform
  1. Windows
Hello Everyone,

Sorry for the clickbait.

I have combed the internet to find a solution for this but I was unable to come up with something that addressing my requirements. They are as follows.

1. There is a list of users which can be even or odd in number (preferable in a separate sheet since the list grows and decreases on a daily basis based on attendance so changes would automatically be catered in distribution if rows are deleted)
2. Each user has a certain weightage value associated to it (1 to 10 or decimal value like 2.1 and 2.4).
3. At the time of running the users in the list will be distributed in two groups.
4. Each group should have as close to equal weightage distribution as possible.
5. All the user in the list should be accounted for in the group irrespective if one group has an additional user, the total weightage should be close to equal.

I have been able to find some websites that can do that like the following however, their weightage distribution is from 1 to 5 which is not enough to assess skill value since someone who is a 2.1 score would be equal to a 2.4 score rounded off.

Keamk - Create random and balanced teams

Looking for assistance regarding this. I also tried using the following excel file create and shared by "Nothing Left to Lose" name "Assign_Teams" from Microsoft forum (Redirecting) but it has some issues like if an equal/exact is not possible, it will not work, it will also create groups of equal number of members not one additional in one or the other team if the total count is an odd number.

Any help in this would be greatly appreciated.

Thanks,
Hawajiko
 
My apologies for the delay. You did make it clear that I was supposed to post cross-site posting of a same situation.
I had to step out and I meant to update the post here with the new link.

Thanks for being kind and posting in here.
Just not thinking straight on this I guess. Please accept my deepest regrets. Moving on, I assure you, rules would be adhered in their entirety.
 
Upvote 0

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
Almost two months I offered some help to solve a problem that involved finding a subset of a group of given numbers whose sum was closest to a target value. This type of problem falls within the field of combinatorial mathematics. In the earlier problem, there were no constraints on the size of the subset (i.e., the number of values). Your problem is similar except there is a constraint on the size of the subset because you want to form two groups from all attendees and each group should, to the extent possible, have the same number of attendees.

Often there is no direct way to obtain a solution, short of a brute-force approach to examine all relevant combinations. This can be done, but consider what would be required: you will want to compute the sum of weights for half of the number of attendees. For a case where there are 20 attendees and we want to examine all possible combinations where we take 10 attendees at a time to form a group, there are 184,756 combinations (=COMBIN(20,10)...unique combinations, regardless of the order selected). The scale of your team-assignment problem poses some challenges: consider that increasing the attendees to 30 taken 15 at a time involves 155,117,520 combinations, an increase of nearly three orders of magnitude.

While working on the earlier problem, I was intrigued by some approaches developed by Tony Dallimore in 2014. The link to his work and my commentary can be found here and throughout this thread:
I haven't dug into Tony's code to understand how he improved on efficiency, but what he describes as his "3rd approach" seemed to offer significant advantages in the earlier problem. Later in the referenced thread, I created a solution that leaned heavily on his code. That same thread includes a link back to Tony's original post and code listing.

For your problem I've attempted a solution using that same code. If you're not familiar with VBA, you would right click on the sheet tab and select View Code. Then insert a "module" in the workbook/worksheet when the VBA editor opens up. Then paste the VBA code into the empty module window and exit by selecting File > Close and Return to Excel. The biggest issue with the code is that it is being used to do something that it was not quite designed to do. The code investigates all combinations that come close to producing a target sum, regardless of the number of values involved. This means that the output table of results will contain results that are of no interest to you because they will contain too many or too few values (weighting scores for attendees who would belong to the sampled group).

Therefore, some post-processing of the results is necessary. The code is set up to associate values with assigned letter-based KeyCodes, found in A2:B2 and down on sheet [Source]. This where the weighting scores for the users/attendees would go, and some scheme is needed to assign them a letter-based KeyCode (more on this in a moment). Then the target value for the sum of weighted values for all attendees in a group is placed in cell C2 or sheet [Source]. Then run the code (go to the Developer menu item, choose Macros from the Code group, and select the code named "Control3"). The results are overwritten onto the sheet named [Result]. There you will find potentially hundreds of combinations of values (and their associated KeyCodes) that produce sums close to the target value. The sums shown in these results seem to use display-rounding, but there is a decimal component to the number if you reformat the output column.

The code is set to output up to 400 combinations. I set the output rows to such a large number since many results will be ignored because they do not contain the correct number of values. It is my hope--but I'm not aware of any guarantees--that this approach will find the group whose sum is the closest. Imagine that such a combination would be listed as the 500th output and we are looking at the first 400...we could conceivably miss something important. It is fair to say that what is found is one of the closest combinations, if not the closest combination. Someone may be able to dig into the code and make some modifications to address some issues: fixing the display rounding, allowing for specifying the number of values to be considered in the groups, and allowing for a tolerance to be set as a way of expanding or shrinking the number of outputs would be the most obvious improvements.

In any case, I obtained encouraging results; and to analyze the code output, I focused on adding some formulas to perform post-processing. First, let's summarize the input side of the [Source] sheet. Your initial list of users and their weighting scores are shown in F5:G5 and down. In H5 and down, you would enter an "x" for any users who are attending an event. Formulas above these columns compute the count of users, the sum of weighting points across all users, the sum of weighting points across all attendees, and the count of the attendees. We use these last two to determine: 1) the target sum (we take one half of the sum of weights of the attendees)...which is inserted by formula into C2 for the code to use; and 2) the number of attendees in one group (=INT(N/2)), where N is the number of attendees. To the left of your input table are fixed KeyCodes (E5 and down)...these should remain as is.

After marking users in attendance (H5 and down), formulas will construct a consolidated list of attendees and their associated KeyCodes and Weighting Scores (I5:K5 and down). Select these results in J2:K2 and down (to include only the visible outputs) and perform a Paste>Paste Special>Values in A2:B2 and down for the code to use (I haven't investigated whether this can be done with formulas). Then run the code as described above. You should see the helper cells in O:Q update to show the sums of groups on the [Result] sheet that contain the correct number of values, and their corresponding sums and row indexes where they can be found. Additionally, another logic test is performed to determine if more than one group has exactly the same "optimal" sum. A formula in O2 determines the row index of the group associated with the sum that is closest to the target value. That particular group's set of KeyCodes is copied by formula into cell L1 and the cells in L2:M2 and down are automatically populated with users assigned to both groups, consistent with the best group of those identified.

It may be difficult to set this up from an XL2BB mini-sheet, so I'm making the file available for download here:

MrExcel_20220723_CombinatoricsSum_Hawajiko.xlsm
ABCDEFGHIJKLMNOPQ
1KeyValueTargetUser CountSum of Scores104.7894737<-- Sum of wts of attendeesAE+AF+AI+AJ+AL+AM+AZ+BA+BB<-- KeyCodes "optimum" group70<-- no. results
2AA1.0052.393017419<-- Number of attendees52.4210526352.36842105<-- Sum Wts145<-- index of optimum0.02631579
3AB2.00Maintain users/wtsIf yes, mark "X"Paste values to A:B910<-- CountSum Wts for results with n=9
4AD3.00KeyCodesUsersScoreIn AttendanceAttendeesKeyCodeWeightGroup 1Group 2Group total wt.Index"optimal"
5AE3.00AAUser 11.000xUser 1AA1.000 User 151.21052632124FALSE
6AF3.00ABUser 22.000xUser 2AB2.000 User 251.31578947125FALSE
7AI4.00ACUser 32.000xUser 3AC2.000 User 352.15789474126FALSE
8AJ4.00ADUser 43.000xUser 4AD3.000 User 452.26315789128FALSE
9AK4.74AEUser 53.000xUser 5AE3.000User 5 51.21052632133FALSE
10AL5.00AFUser 63.000xUser 6AF3.000User 6 51.31578947134FALSE
11AM5.00AGUser 74.000User 9AI4.000User 9 52.15789474135FALSE
12AT7.00AHUser 84.000User 10AJ4.000User 10 52.26315789136FALSE
13AX8.00AIUser 94.000xUser 11AK4.737 User 1151.52631579137FALSE
14AY8.11AJUser 104.000xUser 12AL5.000User 12 51.63157895138FALSE
15AZ9.37AKUser 114.737xUser 13AM5.000User 13 52.47368421139FALSE
16BA9.37ALUser 125.000xUser 18AR6.737 User 1852.57894737142FALSE
17BB9.68AMUser 135.000xUser 20AT7.000 User 2051.47368421143FALSE
18BC9.78947ANUser 145.000User 24AX8.000 User 2451.57894737144FALSE
19AOUser 155.000User 25AY8.105 User 2552.42105263145TRUE
20APUser 165.474User 26AZ9.368User 26 52.52631579146FALSE
21AQUser 176.000User 27BA9.368User 27 51.78947368147FALSE
22ARUser 186.737xUser 28BB9.684User 28 51.89473684149FALSE
23ASUser 197.000User 29BC9.789 User 2952.73684211150FALSE
Source
Cell Formulas
RangeFormula
H1H1=SUMIF(Table13[In Attendance],"x",Table13[Score])
L1L1=INDEX(INDIRECT("Result!$c$2:$c$401"),O2)
C2C2=H1/2
F2F2=COUNTA(Table13[Users])
G2G2=SUM(Table13[Score])
H2H2=COUNTIF(Table13[In Attendance],"x")
L2:M2L2=SUMPRODUCT($K$5:$K$30,--(L5:L30<>""))
L3:M3L3=SUMPRODUCT(--(L5:L30<>""))
Q2Q2=SMALL(ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),1)
O1O1=COUNT(O5:O100)
O2O2=INDEX(P5:P100,MATCH(SMALL(ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),1),ABS(O$5:INDEX(O$5:O$100,$O$1)-$C$2),0))
O3O3="Sum Wts for results with n="&INT(H2/2)
I5:I23I5=IFERROR(INDEX($F$5:$F$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
J5:J23J5=IFERROR(INDEX($E$5:$E$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
K5:K23K5=IFERROR(INDEX($G$5:$G$43,AGGREGATE(15,6,(ROW($F$5:$F$43)-ROW($F$5)+1)/($H$5:$H$43="x"),ROWS($I$5:$I5))),"")
L5:L23L5=IF(I5="","",IF(ISNUMBER(SEARCH(J5,$L$1)),I5,""))
M5:M23M5=IF(L5="",I5,"")
O5:O23O5=INDEX(INDIRECT("Result!$a$2:$a$401"),AGGREGATE(15,6,ROW($1:$400)/--((LEN(INDIRECT("Result!$C$2:$C$401"))-LEN(SUBSTITUTE(INDIRECT("Result!$C$2:$C$401"),"+","")) )=INT($H$2/2)-1),ROWS(O$5:O5)))
P5:P23P5=AGGREGATE(15,6,ROW($1:$400)/--((LEN(INDIRECT("Result!$C$2:$C$401"))-LEN(SUBSTITUTE(INDIRECT("Result!$C$2:$C$401"),"+","")) )=INT($H$2/2)-1),ROWS(O$5:O5))
Q5:Q23Q5=ABS(O5-$C$2)=$Q$2
 
Upvote 0
Holy...

That's some serious effort you put in this. First of all, I would like to thank you profusely for your time.

I was testing it out and I see some issues.

1. Changing the count breaks the result where the total distribution is not close to/equal.
2. With 20 or above count, it only populates group 2
3. The end result for other combination of users is providing a group population of 8 in Group 1 and 10 in Group 2. There can only be a single extra in any group.

How can one randomize this with different combination coming up with a click/refresh? Can you please try to see the file I mentioned in the link? It's a file named Assign_Teams by "Nothing Left to Lose". I am not sure if his VB code is available to view but what he does is something close to what I want however his logic breaks if the groups could not be divided into 'exact' equal score on each group and it cannot handle odd number of members. He does give an option to click a button and randomize the grouping on and on showing different combinations each time.

If I try to understand the logic behind the combination strategy how would the distribution work? Would it so,

1. Sum the total score of present users
2. Dividing the score by 2 to create the value that it needs to be close to.
3. Generate a random number and associate it with each present user
4. Create two groups filled with random people
5. Keep trying till a close match is found?
 
Upvote 0
My query has been answered on the Chandoo forum linked in the above post.

Thanks to everyone who looked at this and attempted to solve this.

Special thanks to Krice for his tremendous effort.
 
Upvote 0
I'm confused by your comments:
1. Changing the count breaks the result where the total distribution is not close to/equal.
You don't explicitly change the the counts anywhere. Your only actions on the [Source] sheet are to fill out the users/attendees block in F5:H34 (and add to the end of that table if you need more room for additional users). Formulas do all of the counting and determine whether there are an even or odd number of attendees, and hence whether the two groups to be formed are of equal size or whether their sizes differ by one.
2. With 20 or above count, it only populates group 2
After adding/deleting x's to the "In Attendance" columns to reflect a new scenario, the worksheet will not automatically update. You still need to follow the instructions in J3:K3 and copy the new KeyCode/Weight table (that reflects the revised attendee list) and paste it into A2:B2 and down as Values only. This can be done by copying the range that covers the updated KeyCode/Weight table, selecting cell A2, and then Paste>Paste Special>Values...and then delete any leftover content below the freshly pasted information (which would be necessary if a shorter block of cells were pasted over a previous longer block). And then the Control3 macro needs to be re-run. Only then, if any relevant solutions are found will the helper columns and the Group1/2 columns populate with meaningful results.
3. The end result for other combination of users is providing a group population of 8 in Group 1 and 10 in Group 2.
This isn't possible if the sheet is used as I described above. If one or more solutions are found with the correct number of values within the 400 rows returned to the [Result] sheet, the groups will be balanced in number to within 1, depending on whether there is an even or odd number of attendees.

About your description of the general approach here, I wouldn't say that the groups are filled with random people and that various combinations are generated randomly.
1. Sum the total score of present users.
2. Divide the total score of present users by 2 to create the target sum value.
3. Count the number of present users.
4. Divide the number of present users by 2 to determine the size of Group 1 and Group 2.
5. Assign each present user an easier to reference key code, not a random number, but a unique letter-based key code that serves as an abbreviated surrogate for their name to facilitate subsequent processing. The assignment of key codes to names is arbitrary.
6. For the subsequent processing, use an algorithm to systematically step through combinations of users (keycodes) and associated scores to determine which ones produce a sum "close" to the target determined in step 2. The algorithm used could vary. The one I used was developed by Tony Dallimore about 8 years ago for a different purpose and it delivers results to sheet [Result]. This class of problem is known as a "subset sum" problem, where one wishes to determine which subset of numbers in array "a" produce a sum "s" (or near a sum "s"). That's what the code does. Your problem imposes an additional constraint and could be described as a "subset sum of k elements" problem where each subset of values considered must consist of "k" elements (i.e., INT(n/2), where "n" is the number of attendees). In the results table, you see indications that the code was designed for the first description (subset sum) rather than the second description with the additional constraint, as most of the results involve subsets consisting of more or fewer values than required. The length of the results table is controlled by the RowRsltArrMax variable. The code initially populates the results table with the first RowRsltArrMax (whatever number that is set to) results, regardless of whether they are very close to the target sum. Then for each subsequent subset of numbers, their sum is compared to the worst solution in the results table. If the new subset is an improvement, it becomes part of the results table and the worst solution is pushed off the table. If the new subset is not an improvement over the worst solution in the table, the new subset is discarded. Then the next subset of values is considered, and so on. The algorithm followed is a variant of a Branch & Bound method that attempts to improve efficiency by avoiding further exploration down certain branches when logic dictates that considering certain larger or smaller values will not offer any improvement. I had set RowRsltArrMax=400, but for a larger number of attendees, it is conceivable that all 400 results returned will not include any that have the correct number of values required to form a group because many other shorter combinations may be encountered first that are deemed "close enough" before the algorithm ever begins adding in more values. In that case, you won't get a meaningful answer on the [Source] sheet. I set up a scenario with 24 attendees and edited the code to change RowRsltArrMax=6000 and edited the helper column formulas to look at those 6000 rows. This change comes at a cost, as the spreadsheet is noticeably slower, although that may be tolerable if you have time to wait (perhaps 1-3 minutes for the code to run, and considerably longer if RowsRsltArrMax is set to even higher values). For the sample problem with n=24, among the 6000 results returned by the code, only 150 involved 12 values. The target sum was 74.553. Of those 150 results with 12 values, their sums ranged from 74.316 to 76.316, and the best sum differed from the target by only 0.026...and that was obtained by 3 combinations of 12 attendees. Based on the requirement given in your first post...
Each group should have as close to equal weightage distribution as possible.
...this is what I assumed you wanted--an approach that gives priority to obtaining the target sum.
But your question in post #13...
How can one randomize this with different combination coming up with a click/refresh?
...suggests that you do not have a firm requirement for an optimized solution aimed at achieving sums equal to each other. You apparently have something else in mind involving distributing the attendees differently.

Since beginning this response, I see that you found a satisfactory solution elsewhere. Thank you for following up with that update, and I'm glad you have a solution that suits your purposes.

If you are interested in investigating this approach further, my latest file is found at the link below. After running the code, the user still needs to confirm that the helper columns O:S are populated. If they are not, then RowsRsltArrMax needs to be set even larger and three formulas need to be updated to correctly reference the results on the [Result] sheet. Then the user needs to select among any of potentially several equivalent solutions by using the dropdown selector in cell R2. That pulls a particular result from the [Result] sheet and shows what the two groups would look like in columns L:M.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top