So, I have a pretty specific thing to do that is rather hard to explain in the title but I hope it gives a general idea. Now, on to what I actually would like to be able to do and I'm not sure what is the best way to tackle it:
* have a set of data tables that are something like:
A - 100 different values
B - 6 different values
C - C1 - 4 different values
- C2 - 5 different values
D - 6 different values
* Based on these 'data sets' I need to generate 20 samples randomly but also keep in mind if one value was used so to not use it again until there are no more 'unique values' to use. As an additional thing, C1 and C2 values have a special relation and order of
C1 - C2: 1-1, 1-2, 1-3, 1-4, 1-5, 2-1, 2-2, 2-3 etc. 4-4, 4-5 in this order
* There would be 20 tables generated from the whole data set that is based on the order and relation between C1 and C2 (4*5 = 20 samples).
* So, since there will be only 20 values used from table A, next time I generate new 20 samples it would have to use the 'remaining' values only (100-20=80 unused values).
A set of 20 samples would be something like:
Sample 1 - A[23], B[3], C1[1], C2[1], D[2]
Sample 2 - A[76], B[5], C1[1], C2[2], D[1]
Sample 3 - A[15], B[6], C1[1], C2[3], D[6]
Sample 4 - A[6], B[2], C1[1], C2[4], D[5]
Sample 5 - A[64], B[1], C1[1], C2[5], D[4]
Sample 6 - A[55], B[4], C1[2], C2[1], D[3]
Sample 7 - A[29], B[4], C1[2], C2[2], D[2]
Sample 8 - A[1], B[2], C1[2], C2[3], D[6]
Sample 9 - A[87], B[3], C1[2], C2[4], D[5]
Sample 10 - A[2], B[6], C1[2], C2[5], D[4]
Sample 11 - A[10], B[1], C1[3], C2[1], D[3]
Sample 12 - A[33], B[5], C1[3], C2[2], D[1]
Sample 13 - A[65], B[4], C1[3], C2[3], D[2]
Sample 14 - A[64], B[3], C1[3], C2[4], D[3]
Sample 15 - A[63], B[1], C1[3], C2[5], D[5]
Sample 16 - A[98], B[2], C1[4], C2[1], D[1]
Sample 17 - A[66], B[6], C1[4], C2[2], D[6]
Sample 18 - A[43], B[5], C1[4], C2[3], D[4]
Sample 19 - A[44], B[6], C1[4], C2[4], D[2]
Sample 20 - A[92], B[4], C1[4], C2[5], D[2]
At the next batch of 20 samples, the A values should never repeat (so, values used in the first 20 samples are 'removed' from the data set), B values don't repeat until all 6 of them are used in consecutive samples, C1 and C2 values respect a clear and fixed order, and D has the same property as B.
The colors are there to show that there is no duplicate value of B and D between 6 consecutive samples.
In the end, with 5 sets of 20 such samples, the whole 100 A values should be covered.
Hope this explains well enough what I need but if there are any other questions don't hesitate to ask. I am thinking of using Pivot tables but I must admit I haven't used them at all and I am not sure what their limitations are to be able to clearly know I should be able to do it in the end.
LE: forgot to add one important detail, the values are not limited to 100/6/6 unique values, these are the initial parameters that might change in time. The idea behind this is to be able to generate randomly 20 samples based on the relation between C1 and C2, not have duplicate values until all A, B and D values are used on previous samples.
* have a set of data tables that are something like:
A - 100 different values
B - 6 different values
C - C1 - 4 different values
- C2 - 5 different values
D - 6 different values
* Based on these 'data sets' I need to generate 20 samples randomly but also keep in mind if one value was used so to not use it again until there are no more 'unique values' to use. As an additional thing, C1 and C2 values have a special relation and order of
C1 - C2: 1-1, 1-2, 1-3, 1-4, 1-5, 2-1, 2-2, 2-3 etc. 4-4, 4-5 in this order
* There would be 20 tables generated from the whole data set that is based on the order and relation between C1 and C2 (4*5 = 20 samples).
* So, since there will be only 20 values used from table A, next time I generate new 20 samples it would have to use the 'remaining' values only (100-20=80 unused values).
A set of 20 samples would be something like:
Sample 1 - A[23], B[3], C1[1], C2[1], D[2]
Sample 2 - A[76], B[5], C1[1], C2[2], D[1]
Sample 3 - A[15], B[6], C1[1], C2[3], D[6]
Sample 4 - A[6], B[2], C1[1], C2[4], D[5]
Sample 5 - A[64], B[1], C1[1], C2[5], D[4]
Sample 6 - A[55], B[4], C1[2], C2[1], D[3]
Sample 7 - A[29], B[4], C1[2], C2[2], D[2]
Sample 8 - A[1], B[2], C1[2], C2[3], D[6]
Sample 9 - A[87], B[3], C1[2], C2[4], D[5]
Sample 10 - A[2], B[6], C1[2], C2[5], D[4]
Sample 11 - A[10], B[1], C1[3], C2[1], D[3]
Sample 12 - A[33], B[5], C1[3], C2[2], D[1]
Sample 13 - A[65], B[4], C1[3], C2[3], D[2]
Sample 14 - A[64], B[3], C1[3], C2[4], D[3]
Sample 15 - A[63], B[1], C1[3], C2[5], D[5]
Sample 16 - A[98], B[2], C1[4], C2[1], D[1]
Sample 17 - A[66], B[6], C1[4], C2[2], D[6]
Sample 18 - A[43], B[5], C1[4], C2[3], D[4]
Sample 19 - A[44], B[6], C1[4], C2[4], D[2]
Sample 20 - A[92], B[4], C1[4], C2[5], D[2]
At the next batch of 20 samples, the A values should never repeat (so, values used in the first 20 samples are 'removed' from the data set), B values don't repeat until all 6 of them are used in consecutive samples, C1 and C2 values respect a clear and fixed order, and D has the same property as B.
The colors are there to show that there is no duplicate value of B and D between 6 consecutive samples.
In the end, with 5 sets of 20 such samples, the whole 100 A values should be covered.
Hope this explains well enough what I need but if there are any other questions don't hesitate to ask. I am thinking of using Pivot tables but I must admit I haven't used them at all and I am not sure what their limitations are to be able to clearly know I should be able to do it in the end.
LE: forgot to add one important detail, the values are not limited to 100/6/6 unique values, these are the initial parameters that might change in time. The idea behind this is to be able to generate randomly 20 samples based on the relation between C1 and C2, not have duplicate values until all A, B and D values are used on previous samples.
Last edited: