Randbetween no duplicates with unequal probabilities (draw simulation)

mrfahaji

New Member
Joined
Oct 25, 2012
Messages
34
Hi,

I have created a list of 136 clubs, and I want to select 40 of them at random. However, they do not have the same probability of being selected (they are five ‘tiers’, with the better clubs more likely to be chosen). The process should simulate a draw, where if the first number revealed belongs to Club A, then Club A go in that slot and cannot be selected again.

In order to simulate this, I have mapped ranges to single numbers, where the best clubs are +20 each time, the next tier +10, then +5, +2 and +1. I then use a VLOOKUP with a 1 as the [range lookup]. The RANDBETWEEN function is therefore set at (0,631) and it maps to 1-136 different clubs. Hopefully that's clear...

I run into problems because I want the first 40 non-duplicate numbers to be used. The RANK.EQ/COUNTIF does not seem to work because it ranks the entire list, and therefore the first drawn number gets altered due to the ranking process (e.g. if number 2 is drawn, then that should be 'Club B', but if there are seven “1”s further down the list, this 2 becomes an 8).

This is a shorted sample of how I have mapped the values to the clubs (note the changing intervals to reflect the different probability). Club names are illustrative only, so no trickery on the text strings:

ValueIDClub
01Club A
202Club B
403Club C
504Club D
605Club E
706Club F
757Club G
808Club H
829Club I
8310Club J
…..…..…..
631136Club ZZZ



Then my current attempt at solving this is:

RANDBETWEEN(0,631) / VLOOKUP(rand,table above,2,1) / RANK.EQ+COUNTIF

Which does give me a unique list, just not the list I want.

The main issue is that the clubs have different probability of being selected, and I need to figure a way to represent this while also not having duplicate results. Any suggestions welcome!

I don’t really want to use VBA and I’m not using 365 Apps (so no RANDARRAY, which might not solve the problem anyway).
 

Excel Facts

Select a hidden cell
Somehide hide payroll data in column G? Press F5. Type G1. Enter. Look in formula bar while you arrow down through G.
If I'm following you correctly, would the following work? Use your intervals of 20, 10, etc. to establish the lower value of some arbitrary scale (column A). The end of that scale needs to be specified (see A12 where I've essentially declared that Club J also has an interval of 1). So the entire scale runs from [0,84) in this case. We compute the Interval between each subsequent club's lower range (see column D). The range [0,84) is then mapped to [0,1) (see column E) so that a uniformly distributed random number (generated with RAND()) can be used. RAND() produces numbers on the interval [0,1) and it is highly unlikely that any two will be exactly the same. We then generate uniformly distributed random numbers for however many slots we'd like to fill...say 5 in this case (see column H) and use VLOOKUP to find the corresponding club whose interval the random number falls within.
MrExcel_20220808.xlsx
ABCDEFGHI
1Lower ValueIDClubIntervalRAND lower thresholdRAND1st 5
201Club A200Club A0.443997Club B
3202Club B200.238095238Club B0.991945Club J
4403Club C100.476190476Club C0.602342Club D
5504Club D100.595238095Club D0.058422Club A
6605Club E100.714285714Club E0.664705Club D
7706Club F50.833333333Club F
8757Club G50.892857143Club G
9808Club H20.952380952Club H
10829Club I10.976190476Club I
118310Club J10.988095238Club J
12841
Sheet6
Cell Formulas
RangeFormula
H2:H6H2=RAND()
I2:I6I2=VLOOKUP(H2,$E$2:$F$11,2)
D2:D11D2=A3-A2
F2:F11F2=C2
E2:E12E2=A2/$A$12
 
Upvote 0
Thanks for looking at this. Currently on phone so need to check properly tomorrow. Only question - your list of 5 is returning two Club Ds, so not sure how that is avoiding the duplicates?
 
Upvote 0
Oh…sorry, I misread your post. I have no scheme in place to prevent duplicate clubs. I’ll give this some more thought.
 
Upvote 0
For an older version of Excel and no VBA, you are probably going to need a solution with some helper columns. The approach I used begins with your input Source Data (columns A:D). Be sure to add the last value to this source table (shown in yellow) to indicate what would be the next lower value if another item were added to the list. This ensures that the correct interval (see orange) can be calculated for the last club shown.

Next, a lookup table (cols F:G) is created to show how the range of [0,1) is divided into subranges that correspond to the interval weighting indicated by column D.

Next, a long list of uniformly distributed random numbers between [0,1) is created (col K). This list is volatile and will recompute as the worksheet is refreshed. To avoid an ever-changing set of random numbers, I would recommend forcing a recalculation (hit F9), then copy the list of values in col K and paste them as values into column L. This list of random numbers needs to be very long compared to the number of clubs you want in your final list.

Next, VLOOKUP is used to assign whichever club corresponds to the static random numbers (see col M). Then a logic test is performed to determine if the teams shown in column M have already been selected above (see col N).

Finally, the selection order with no repeats shown in column N is consolidated in column I, where the final results are shown. I've pulled the formula down until an error appears. The error means that the bottom of column N has been reached and there are no more teams to show. You will see that not all of the teams are listed in col I, but if the random number lists in cols K:L were extended further, and the formulas in cols M:N also extended, then eventually all teams would appear in col I as the col I formula is pulled down.

To keep this mini-sheet of manageable size, I haven't shown the entire extent of cols K:N...
MrExcel_20220808.xlsx
ABCDEFGHIJKLMN
1Source DataLookup for weighted RAND
2Lower ValueIDClubIntervalRAND lower thresholdWeighted Selectionvolatile RAND()static RAND()Corresponding ClubSelection Order, No Repeats
301Club A200Club AClub C0.4702007180.561150983Club CClub C
4202Club B200.224719101Club BClub A0.3149985550.096074926Club AClub A
5403Club C100.449438202Club CClub D0.936956890.218719519Club A 
6504Club D100.561797753Club DClub I0.7366025160.551494472Club C 
7605Club E100.674157303Club EClub B0.5799677870.66598016Club DClub D
8706Club F50.786516854Club FClub E0.9809226060.515384199Club C 
9757Club G50.842696629Club GClub F0.5039076220.655208907Club D 
10808Club H20.898876404Club HClub G0.0172464850.929281476Club IClub I
11829Club I10.921348315Club IClub H0.0530486740.237126244Club BClub B
128310Club J10.93258427Club JClub L0.1591567170.459393571Club C 
138411Club K10.943820225Club KClub J0.712362960.686114853Club EClub E
148512Club L10.95505618Club L#NUM!0.730599470.832354293Club FClub F
158613Club M10.966292135Club M#NUM!0.1025374910.015826487Club A 
168714Club N10.97752809Club N0.6009612540.062909421Club A 
178815Club O10.988764045Club O0.98503980.661208834Club D 
188910.1400677670.894328505Club GClub G
190.0285364730.712825756Club E 
200.2502601930.90831293Club HClub H
210.5305861750.265428971Club B 
Sheet6
Cell Formulas
RangeFormula
M3:M21M3=VLOOKUP(L3,$F$3:$G$17,2)
N3:N21N3=IF(COUNTIF(N$2:N2,M3)>0,"",M3)
I3:I15I3=INDEX($M$3:$M$100,AGGREGATE(15,6,(ROW($N$3:$N$100)-ROW($N$3)+1)/($N$3:$N$100<>""),ROWS(I$3:I3)))
D3:D17D3=A4-A3
G3:G17G3=C3
F3:F18F3=A3/$A$18
K3:K21K3=RAND()
 
Upvote 0
Solution
Helper columns are fine! Once again I’m reading this from a phone And won’t be able to try it until later in the week, but the illustration you’ve presented looks very promising, so thank you! Will report back once tested.
 
Upvote 0
Let me know if it works okay. If you want to avoid the extra step of copying the volatile random numbers to static values (the columns K:L step described in my last post), then change the M3 (and down) formula to:
Excel Formula:
=VLOOKUP(K3,$F$3:$G$17,2)
I did that in some tests and it seems to behave okay, but the list will change with every refresh of the worksheet. In other words, if you use the quicker version that directly utilizes the volatile random numbers, the resulting list of teams in column I can be examined, but if any operations are performed, the list will change. You can copy, say, the first 6 teams in column I and paste them as values elsewhere (to preserve them), but the very act of "paste as values" will cause the dynamic list in column I to change. You can try out the two variants to see which works best for your application, as it involves changing only one reference in the column M formula.
 
Upvote 0
This is amazing, thank you so much, it seems to work a treat! I've extended the K:N section down to 300 rows to (almost) guarantee that a unique list of 40 teams will always be generated. What do the '15,6' values in the aggregate function represent? I assumed 15 was to do with the number of clubs to select, but when I changed it to 40 it stopped working, and it seems to be fine with 15 so have left it as it is!

Thanks again, hopefully it was an enjoyable/satisfying problem to solve - I feel somewhat vindicated in asking, because it WAS solvable but clearly not straightforward!
 
Upvote 0
I’m glad to hear it’s working for you. The 15 tells AGGREGATE to return the nth smallest value in the array. The “n” is the 4th argument of AGGREGATE, consisting of the ROWS term…so as the formula is pulled down, ROWS returns values of 1 then 2 then 3 and so on. The “array” that AGGREGATE operates on is the 3rd argument and is an array formed by dividing one array by another array. The numerator array contains row indexes {1;2;3;…}, and the denominator array contains the logic to consider only those rows that are not blank. This strategy is used to isolate row indexes of interest while producing “division by 0” errors everywhere else (wherever blanks appear, the logic test results in FALSE, which is coerced to a 0 during the division operation, producing an error). We want these errors because they can be easily ignored by using the “6” code as the 2nd argument of AGGREGATE. The result is a consolidated list of row indexes that are wanted, with no blanks between them. Those row indexes are then fed into the INDEX function to return the team name.

Did you use the version that uses the static RAND values or the dynamic/volatile values? If you’re using the volatile values, the static value helper column can be deleted.
 
Upvote 0

Forum statistics

Threads
1,224,818
Messages
6,181,152
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top