Generate unique subsets with even distribution.

danomaniac

New Member
Joined
Feb 11, 2025
Messages
2
Office Version
  1. 2003 or older
Platform
  1. Windows
I am trying to figure out how to generate a list of unique subsets from a larger set of numbers where each number in the larger set is evenly distributed across all subsets.

Ideally, each subset would be sorted ascending left to right.

For example, if the set is 12 numbers is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and I wanted to make 20 subsets of 7 numbers each, then the subsets would be something like:
subset 1: 1, 2, 3, 4, 5, 6, 7
subset 2: 1,2,8,9,10,11,12
subset 3: 3,4,5,6,7,8,9
subset 4: 2,3,4,8,9,10,11
subset 5: 1,2,5,6,7,8,9
etc. to
subset 20: 6,7,8,9,10,11,12

I know there are thousands of permutations that will be omitted with a small number of subsets, but the important thing is no duplicates and even distribution, or as close to even as possible. In this case, 20 sets of 7 numbers is 140, divided by 12 numbers is 11.667 times each, so 8 of the numbers would be used 12 times and 4 would only be used 11 times (8x12=96, + 4x11=44, 96+44=140)

Also, I'd like to be able to change the size of the sets. For example: 20 sets of 6 numbers from a set of 10, or 40 sets of 9 numbers from a set of 16. All while maintaining even distribution.
 
One of propositions could be such VBA code:

VBA Code:
Sub test()
Dim setsize As Long, subsetsize As Long, subsetsnumber As Long
Dim allnumbers As Variant, repetitions() As Long, maxallowed As Long, minallowed As Long
Dim i As Long, j As Long, proposition As Long, counter As Long, result As Variant

setsize = Range("B1").Value
subsetsize = Range("B2").Value
subsetsnumber = Range("B3").Value

ReDim allnumbers(1 To subsetsize * subsetsnumber)
ReDim repetitions(1 To setsize)
maxallowed = WorksheetFunction.RoundUp(subsetsize * subsetsnumber / setsize, 0)
minallowed = WorksheetFunction.RoundDown(subsetsize * subsetsnumber / setsize, 0)
'first fill a table of minallowed of each element of set
For i = 1 To minallowed
  For j = 1 To setsize
    repetitions(j) = repetitions(j) + 1
    allnumbers((i - 1) * setsize + j) = j
  Next j
Next i
' then fill the rest with not-repeating randomly selected elements
counter = 0
For i = minallowed * setsize + 1 To subsetsize * subsetsnumber
  Do
    proposition = WorksheetFunction.RandBetween(1, setsize)
    counter = counter + 1
  Loop Until repetitions(proposition) < maxallowed Or counter > subsetsize * subsetsnumber
Debug.Print counter; subsetsize * subsetsnumber
  If repetitions(proposition) < maxallowed Then
    allnumbers(i) = proposition
    repetitions(proposition) = repetitions(proposition) + 1
    counter = 0
  Else 'pretty unlikely, but (really marginally) possible
    MsgBox "Could not generate input table. Sorry", vbCritical
    Exit Sub
  End If
Next i
'sorting is easier in worksheet than in a code
With Range("D1").Resize(subsetsize * subsetsnumber, 1)
  .Value = Application.Transpose(allnumbers)
  .Sort Key1:=Range("D1"), Header:=xlNo, Order1:=xlAscending, Orientation:=xlSortColumns
  allnumbers = .Value
End With
Range("D1").Resize(Rows.Count, Columns.Count - 4).ClearContents
ReDim result(1 To subsetsnumber, 1 To subsetsize)
For i = 1 To subsetsnumber
  For j = 1 To subsetsize
    result(i, j) = allnumbers((j - 1) * subsetsnumber + i, 1)
  Next j
Next i
Range("D1").Resize(subsetsnumber, subsetsize) = result
End Sub

PS. The code before "'sorting is easier in worksheet than in a code" prepares a nice entry point for different dividing all generated random values with even distribution (140 - depending on subset size and subsets number) into separate subsets. I used an ordered version which seems appealing to me (fitting all your requirements) . And today have no time for other divisions. But if they look too ordered for you - it is a good starting point (I think) for other solutions.


Your input values in B1:B3, See below:

Book1
ABCDEFGHIJK
1set size1212467911
2subset size712467911
3subsets number2012467911
4input values above12468911
5and run macro13468911
613468911
713468911
8135681011
9135681011
10135681012
11135681012
12135781012
13235781012
14235781012
15235781012
16235791012
17245791012
18245791012
19245791112
20246791112
21
Sheet1
 
Last edited:
Upvote 0
Try.
The codes will randomly distribute numbers, not evenly.

VBA Code:
Sub GenerateRandomSubsets()
    Dim N As Integer, X As Integer, Y As Integer
    Dim Numbers() As Integer
    Dim i As Integer, j As Integer, k As Integer
    Dim Subset As Collection
    Dim Results As Collection
    Dim rng As Range
    
    ' Get the values of N, X, and Y from the input dialog boxes
    N = InputBox("Please enter the total number of numbers (N):")
    X = InputBox("Please enter the number of numbers in each subset (X):")
    Y = InputBox("Please enter the number of subsets to generate (Y):")
    
    ' Initialize the array of numbers
    ReDim Numbers(1 To N)
    For i = 1 To N
        Numbers(i) = i
    Next i
    
    ' Initialize the collection of results
    Set Results = New Collection
    
    ' Generate Y subsets
    For i = 1 To Y
        Set Subset = New Collection
        Do While Subset.Count < X
            j = Int(Rnd * N) + 1
            On Error Resume Next ' If the number is already in the subset, skip the error
            Subset.Add Numbers(j), CStr(Numbers(j))
            On Error GoTo 0
        Loop
        
        ' Sort the subset
        Call SortCollection(Subset)
        
        Results.Add Subset
    Next i
    
    ' Output the results to A1 cell and extend to the right and down
    Set rng = Range("A1")
    For i = 1 To Results.Count
        Set Subset = Results(i)
        For j = 1 To Subset.Count
            rng.Offset(i - 1, j - 1).Value = Subset(j)
        Next j
    Next i
End Sub

' Helper function to sort the collection
Sub SortCollection(coll As Collection)
    Dim i As Integer, j As Integer
    Dim temp As Variant
    For i = 1 To coll.Count - 1
        For j = i + 1 To coll.Count
            If coll(i) > coll(j) Then
                ' Swap the elements
                temp = coll(i)
                coll.Add coll(j), Before:=i
                coll.Add temp, Before:=j
                coll.Remove i + 1
                coll.Remove j + 1
            End If
        Next j
    Next i
End Sub
 
Upvote 0
the important thing is no duplicates and even distribution, or as close to even as possible.
@danomaniac, welcome to the Forum!

Just in case you're interested (it's probably beyond the scope of what you're trying to do here), there is a bit of maths required to generate truly balanced samples.

If we look at @Kaper's example in Post #2, the individual numbers have the required frequencies, but pair frequencies vary widely from 2 (e.g. for the pair 4|10) to 11 (e.g. 1|6).
Similarly, for triples the frequencies vary from 0 (e.g. 1|2|3) to 9 (e.g. 1|6|11) etc.

One way to reduce these variations is to use Sobol sequences. There is some analysis here of a Sobol sequence approach vs random sampling. How to generate balanced combinations?

It may be possible to reduce some of the variation by looping through various random samples and optimising the results. But I doubt you'd put much of a dent into the problem this way. Choosing 7 from 12 numbers is only 792 combinations, but choosing 20 samples from 792 provides an astronomical number of possibilities.
 
Upvote 0
Thank you for the replies. You are correct, what you are suggesting is beyond the scope of what I am trying to do. I am trying to generate no more than 20 unique subsets from a set of 12 to 16 numbers with even distribution. Unfortunately, and I apologize, but I need to put this back on a shelf for now so I don't know when I'll get back to it.
 
Upvote 0
Just in case - it waits here till when you will be back :-)

in visual basic editor in references mark Solver. and then with input values in B1:B3 like in post #2 run sucjh macro:

First part gives similar results to these by @HongRu (uses somewhat different approach, so is a bit more compact), but produces for purpose a surplus of results. Then the second uses solver to pick only needed number of subsets but having in mind balanced distribution of elements (see formulas in column B)

VBA Code:
Sub test2()
Dim setsize As Long, subsetsize As Long, subsetsnumber As Long
Dim i As Long, j As Long, proposition As Long
Dim coll As Collection, list As Object

' initialize
setsize = Range("B1").Value
subsetsize = Range("B2").Value
subsetsnumber = Range("B3").Value
Range("D1").Resize(Rows.Count, Columns.Count - 4).ClearContents
Range("A19").Resize(Rows.Count - 19, 2).ClearContents

'First generate 100 subsets
Range("D2:D101") = 0
Range("D2:D" & subsetsnumber + 1) = 1
For j = 1 To 100
  Set coll = New Collection
  For i = 1 To setsize
    coll.Add i
  Next i
  Set list = CreateObject("System.Collections.ArrayList")
  For i = 1 To subsetsize
    proposition = WorksheetFunction.RandBetween(1, setsize + 1 - i)
    list.Add coll(proposition)
    coll.Remove (proposition)
  Next i
  list.Sort
  Cells(j + 1, "E").Resize(1, subsetsize).Value = list.ToArray
Next j

' then use Solver to select as many as you need, having in mind (a goal for solver) as equal distribution of elements as possible  

For i = 1 To setsize
  Range("A20").Offset(i, 0).Value = i
  Range("B20").Offset(i, 0).FormulaR1C1 = "=SUMPRODUCT(R2C[2]:R101C[2]*(R2C[3]:R101C[" & subsetsize + 2 & "]=RC[-1]))"
Next i
Range("B20").FormulaR1C1 = "=STDEV(R[1]C:R[" & setsize & "]C)"
Range("B19").FormulaR1C1 = "=COUNTIF(R[2]C:R[" & setsize + 1 & "]C,""<"" & ROUNDDOWN(R2C2*R3C2/R1C2,0))"
Range("A19").FormulaR1C1 = "=COUNTIF(R2C[3]:R101C[3],1)"
SolverReset
SolverOk SetCell:="$B$20", MaxMinVal:=2, ValueOf:=0, ByChange:="$D$2:$D$101", _
   Engine:=3, EngineDesc:="Evolutionary"
SolverAdd CellRef:="$A$19", Relation:=2, FormulaText:="$B$3"
SolverAdd CellRef:="$D$2:$D$101", Relation:=5, FormulaText:="binary"
'SolverOk SetCell:="$B$20", MaxMinVal:=2, ValueOf:=0, ByChange:="$D$2:$D$101", _
   Engine:=3, EngineDesc:="Evolutionary"
SolverSolve
Range("D1").Formula = "=IF(B19=0,""you may filter only values 1 in column D"",""Try to run Solver again and check for 0 in B19, or may be run whole macro again"")"
Range("D2").Resize(100, subsetsize + 1).Sort key1:=Range("D1"), Header:=xlNo, Order1:=xlDescending, Orientation:=xlSortColumns
End Sub

generator.xlsm
ABCDEFGHIJKL
1set size12you may filter only values 1 in column D
2subset size71245681011
3subsets number201136781012
4input values above1123591112
5and run macro113456912
6112356910
71234671011
812356101112
9112345911
101234691011
1112578101112
121456781112
131124891011
14114578910
151123471012
161146781011
171167891112
181347891012
19200112358912
200,492366123567912
211121157891112
222120234681112
23312012356712
24411023568910
25512013568912
266110245891011
277110345791011
288110346891012
29912012367912
3010120345681012
3111120247891112
32121201345689
33012345611
34more lines012578912
35below012348911
Sheet2
Cell Formulas
RangeFormula
D1D1=IF(B19=0,"you may filter only values 1 in column D","Try to run Solver again and check for 0 in B19, or may be run whole macro again")
A19A19=COUNTIF(D$2:D$101,1)
B19B19=COUNTIF(B21:B32,"<" & ROUNDDOWN($B$2*$B$3/$B$1,0))
B20B20=STDEV(B21:B32)
B21:B32B21=SUMPRODUCT(D$2:D$101*(E$2:K$101=A21))
Named Ranges
NameRefers ToCells
solver_adj=Sheet2!$D$2:$D$101A19, B21:B32
solver_lhs2=Sheet2!$D$2:$D$101A19, B21:B32
solver_lhs3=Sheet2!$D$2:$D$101A19, B21:B32
solver_rhs1=Sheet2!$B$3B19
solver_rhs4=Sheet2!$B$3B19
solver_rhs5=Sheet2!$B$3B19
solver_rhs7=Sheet2!$B$3B19
 
Last edited:
Upvote 0
Just in case you are not familiar with Solver - see: Define and solve a problem by using Solver - Microsoft Support and Load the Solver Add-in in Excel - Microsoft Support

And if the part about references is not clear:

1739533979304.jpeg
 
Upvote 0

Forum statistics

Threads
1,226,771
Messages
6,192,918
Members
453,766
Latest member
Gskier

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top