Have Excel grab (somewhat) random rows

FrumpyJones

New Member
Joined
Feb 11, 2008
Messages
43
Hi Everyone. So I have data (say 500 rows). The data is broken up by persons, and what I want excel to do is ultimately grab a random sample of rows from each person (Let's say 5) and that the samples for each person must be at least two away from the previous one. In the real world there will be a random number of rows (But greater than 10) for each person. annnnnd copy all of those random selections into a new tab (or delete the rows not selected... whichever is easier). Thanks in advance for you help on this one. Data kinda looks like this (Name will always be in column A):

[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD]Name[/TD]
[TD]Data blah blah[/TD]
[TD]more data blah blah[/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD]stuff 1[/TD]
[TD]more stuff 1[/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD]stuff 2[/TD]
[TD]more stuff 2[/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD]etc[/TD]
[TD]etc[/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Charlie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Debbie[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
 
Last edited:

Excel Facts

Shade all formula cells
To shade all formula cells: Home, Find & Select, Formulas to select all formulas. Then apply a light fill color.
I have some issues with this question, purely in terms of randomness

Ordinarily I'd use =RAND() on every row to create random numbers, then use something like RANK or SMALL to extract the smallest values which are now in a random order. These can easily be combined with IF as an array formula if needed. My problem here is the requirement to be "at least two rows away from the previous one". [Side question, do you mean two rows between or one row between? I read this as row 3 is OK if row 1 has been selected]. Say you have 10 items in your data set and the first one selected randomly is item 3. You can now have only 3,5,7,9 and 11, but 11 doesn't exist, and this isn't randomness either - the other items have to be entirely correlated, which in turn makes it difficult to calculate your answer using randomness

So what's the priority here? Randomness or distance between items selected?
 
Upvote 0
Thanks for replying @baitmaster. Sorry I wasn't clear in the original post.

The way the employees work may sometimes have them creating data for the same customer right after the first one (it would be very rare to have three at a time). To get a random sampling of how Charlie is dealing with customers, I wouldn't want to get both row's 3 and 4. I would at least want it to be 3 and 5, but could be 3 and 12. Hence SOMEWHAT random :). To put it into an example:

Charlie has 17 rows of data today.
Debbie has 23 rows of data today
Frank has 31 rows of data (Yes, I just realized I randomly picked three prime numbers).

I want to get 5 rows each from Charlie, Debbie and Frank, but I don't want any two of Charlies's 5 to be sequential, and none of Debbie's to be sequential, and none of Franks to be sequential. (I'm fine if the last row of Charlie and the first row of Debbie's follow each other. (In reality I'm looking at 15 or more individual names).

Every day is going to have a random number of total rows per person, but it's always greater than 10 per.

Hope I was able to make it clearer :)
 
Last edited:
Upvote 0
Interesting. So your requirement isn't to have a gap between the rows selected, it's to have different clients. I don't suppose for a moment there's a client reference / identifier shown on each row? If you can in some way filter your data for uniqueness then this complexity goes away
 
Upvote 0
Not in the way it pulls it now... But I think I can alter the report a bit. How about if I could guarantee a pull of 100+ lines per person ( I could just make it a week instead of a day)... Then you can just randomize based off the employee name with random between 3 and let's say 15. That way we will not get any consecutive and we can still make the count of 5 easily before moving onto the next employee?
 
Upvote 0
OK, how about differentiating into 2 groups based on odd/even row numbers?

column A = a series of unique references for demo purposes. I've just used a, b, c...
B1, copied down: =IF(ROW()/2=INT(ROW()/2),RAND(),1+RAND())

D1 down are ranking values 1, 2, 3....
E1 copied down are the smallest values in ranked order =SMALL($B:$B,D1)
F1 copied down demonstrates the uniqueness =INDEX(A:A,MATCH(E1,B:B,0))

This way you get two unique data subsets where all members are disconnected from their original neighbour
 
Upvote 0
Hhi @baitmaster,

Now we get to the part where I'm not a programmer -- at all -- , but have a good Boolean sense about me.

I can putz around with formulas to get things to happen, but once we hit VBA and macros, I start to assume a fetal position and suck my thumb a lot.

From a drunken bastard-love-baby of a three-way hookup of boolean, horrible-batch-programming and C declarations mindset (Which also shows my age (MS-DOS 3.2 "4-ev-ah"!)), this is what I was seeing in my NOT A PROGRAMMER BRAIN after my last reply:
int name = blank
int counter = 0
int row = 1

#BeginSearch
get text from "a",row
if text from "a",row != name then name = text from "a",row
if counter !=5 then counter = counter + 1, else voodoo magic to search for a new row with a new text for name, set counter = 0, row = new row that we found new name, then goto BeginSearch .
Copy row, paste row into new tab/sheet
Row = row + Random number generator between 3 and 15
goto BeginSearch.

repeat until no new names found.​

Hell that is super ugly... it's why I'm here begging for help :)
 
Last edited:
Upvote 0
My first attempt was a non-VBA approach as it's easier to create and understand

Coding-wise, there's neater ways to do it, but I need to see what your data looks like. Ideally it will start at Cell A1 and be in a single rectangular block, with nothing else on the worksheet outside of that set of rows and columns. I need to know the exact name of the worksheet and each main column header. Is that possible?
 
Upvote 0
Hmm... I really have to be careful due to company security and all.

Allow me to simplify a bit.

The worksheet is just sheet1.

It has 7 columns. columns c - g are filled in AFTER we finished putting in the data in from Columns A and B (Which is where we've started from).

Column A is the name of the employee. The header for Column A is NAME. The names are alphabetical Firstname Lastname. So there will be xx rows of charlie before we get to xx rows of debbie. And then we get xx rows of Ethan, and so on. There is a column B with a header of TICKET #. It has a ticket number and each one is unique (I.E. it's the ticket made by the employee - A2 Charlie made B2 Ticket 123456 ).

Once I have the random sample list parsed, Then I will go about filling in columns C-G based on the research I get from looking at the TICKET NUMBER made by NAME
 
Upvote 0
This code works for me on a basic set of data. You will need to create a reference [VB Editor > Tools > References >] to Microsoft Scripting Runtime

Code:
Option Explicit


' change this to adjust the number of samples of each item
Const iSamples As Integer = 21




Sub chooseRandom()


' create a scripting dictionary to hold concatenated results for each unique ID
Dim dictCodes As Scripting.Dictionary: Set dictCodes = New Scripting.Dictionary


' create another dictionary to hold arrays of unique random numbers
Dim dictRands As Scripting.Dictionary


' create array to hold chosen items
Dim arrResults: ReDim arrResults(1 To 2, 1 To 1)
Dim iResCount As Integer


' create other variables required
Dim strID As String


' pull unconnected data into dictionary by choosing only even rows (also ignores header row)
Dim cl As Range
For Each cl In Sheet1.Columns(1).SpecialCells(xlCellTypeConstants)
    If cl.Row / 2 = cl.Row \ 2 Then
        strID = cl.Value
        dictCodes(strID) = dictCodes(strID) & cl.Offset(0, 1) & "|"
    End If
Next cl


' process dictionary
Dim k1, k2, s
For Each k1 In dictCodes.Keys
    
    ' split dictionary item into constituent parts
    s = Split(dictCodes(k1), "|")
    
    ' get integers that reflect position in dictionary
    Set dictRands = dictUniqueRands(UBound(s))
    
    ' pass relevant item to results array
    For Each k2 In dictRands.Keys
        iResCount = iResCount + 1
        ReDim Preserve arrResults(1 To 2, 1 To iResCount)   ' only last dimension of array can be resized when preserving data. Must transpose later
        arrResults(1, iResCount) = k1
        arrResults(2, iResCount) = s(k2)
    Next k2
    
Next k1


' pass results array to new workbook. Can change to any other location and avoid creating new workbook
Dim wb As Workbook
Set wb = Workbooks.Add
wb.Sheets(1).Range("A1").Resize(UBound(arrResults, 2) - LBound(arrResults, 2) + 1, UBound(arrResults, 1) - LBound(arrResults, 1) + 1).Value = Application.Transpose(arrResults)


End Sub




Function dictUniqueRands(iMax As Integer) As Scripting.Dictionary
' creates a set of numbers chosen at random but not repeated. Because this is to work with a base-0 dictionary we include 0 but exclude the dictionary size
Dim d As Scripting.Dictionary: Set d = New Scripting.Dictionary
Dim i As Integer


Do
    i = WorksheetFunction.RandBetween(0, iMax - 1)
    If Not d.Exists(i) Then d(i) = i
Loop Until d.Count = WorksheetFunction.Min(iMax, iSamples)  ' needs to consider case where less data items than samples requested


Set dictUniqueRands = d


End Function
 
Upvote 0
Solution

Forum statistics

Threads
1,225,738
Messages
6,186,725
Members
453,368
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top