5% Random Sample with Two Variables

shou0028

New Member
Joined
Sep 3, 2015
Messages
1
I need to do an audit of 5% of each of my coworkers complaint files. I pulled the data so that Column A contains the complaint file number, and Column B contains the complaint owner. There are about 7 different owners, but every complaint file number is different.

I need to pull out a random sample of 5% of each owner's complaints. Each owner does not have the same number of complaints, so 5% of one person's complaints might mean 10 complaints, by 5% of another person's complaints might mean 25.

I will be doing this audit monthly, so I'm looking to write a program or something so I just update my source data each month and it will generate a random sample of complaint file numbers from Column A.

Please let me know if I'm not explaining this well! Trying my best :)
 

Excel Facts

VLOOKUP to Left?
Use =VLOOKUP(A2,CHOOSE({1,2},$Z$1:$Z$99,$Y$1:$Y$99),2,False) to lookup Y values to left of Z values.
I think this is very plausible in excel.

Assumptions:
Column A contains file numbers. (A1 = "File Number", A2:A1000 contains file number data - for example purposes)
Column B contains complaint owners (B1 = "Owner", B2:B1000 contains owner data).

In C1 enter a column name (ex "Rand")
In C2 enter a formula: =RAND().
In D1 enter a column name (ex "Sample flag")
In D2 enter a formula: =IF(COUNTIF($B$2:B2,B2)/COUNTIF($B$2:$B$1000,B2)<5%,1,0)

Copy both formulas down.
Do a 2 layer sort on this new table: Sort by Owner then by Rand.

Column D will contain 0s and 1s, the 1s are a random 5% sample per owner. Just do a filter on that column for 1 and copy paste to your audit sheet.

If you want to generate a new sample, just hit F9 to recalculate the random numbers and repeat your sort and filter.
 
Upvote 0
I took a different approach. This worksheet contains 1200 records of sample data, Column A being a Unique Id (like a file number) and Column B being an Owner's name. The formulas in Row4 Columns F through L look for matches of the Owner and report the top specified number of records (5% of the count of each Owner) in the unique ID column. The formulas are looking at columns A and B in their entirety but you can specify a range, a named range or Table name as desired; entire rows are slow to calculate, so keep that in mind. You'll have to copy the formulas down to exceed the number of rows you predict the sample count will be; in my example of 1200 records, each of the seven owner's names seemed to appear under 12 times so I went down 15 rows.

The sample data is randomly created with random numbers letters and names. You can copy and paste all the sample data as values to stop the randomisation.



Excel 2012
ABCDEFGHIJKL
Unique IDOwnerSampleOwnerAndrewBartholomewChristopherDavidEthanFranciscoGregory
U64433Ethan
E23759David
Y23581Andrew

<tbody>
[TD="align: center"]1[/TD]

[TD="align: right"][/TD]

[TD="align: center"]2[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"]1200[/TD]
[TD="bgcolor: #E2EFDA, align: right"]179[/TD]
[TD="bgcolor: #E2EFDA, align: right"]158[/TD]
[TD="bgcolor: #E2EFDA, align: right"]175[/TD]
[TD="bgcolor: #E2EFDA, align: right"]166[/TD]
[TD="bgcolor: #E2EFDA, align: right"]164[/TD]
[TD="bgcolor: #E2EFDA, align: right"]178[/TD]
[TD="bgcolor: #E2EFDA, align: right"]180[/TD]

[TD="align: center"]3[/TD]

[TD="align: right"][/TD]
[TD="align: right"]5%[/TD]
[TD="align: right"]60[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]
[TD="bgcolor: #E2EFDA, align: right"]8[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]
[TD="bgcolor: #E2EFDA, align: right"]9[/TD]

[TD="align: center"]4[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="bgcolor: #C6E0B4"]Y23581[/TD]
[TD="bgcolor: #C6E0B4"]H68222[/TD]
[TD="bgcolor: #C6E0B4"]D71591[/TD]
[TD="bgcolor: #C6E0B4"]E23759[/TD]
[TD="bgcolor: #C6E0B4"]U64433[/TD]
[TD="bgcolor: #C6E0B4"]D77242[/TD]
[TD="bgcolor: #C6E0B4"]D16739[/TD]

</tbody>
Sheet17

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Worksheet Formulas[TABLE="width: 100%"]
<tbody>[TR="bgcolor: #DAE7F5"]
[TH="width: 10"]Cell[/TH]
[TH="align: left"]Formula[/TH]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]A2[/TH]
[TD="align: left"]=CHAR(RANDBETWEEN(65,90))&RANDBETWEEN(10000,100000)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]B2[/TH]
[TD="align: left"]=LOOKUP(RANDBETWEEN(1,7),{1,2,3,4,5,6,7},{"Andrew","Bartholomew","Christopher","David","Ethan","Francisco","Gregory"})[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]A3[/TH]
[TD="align: left"]=CHAR(RANDBETWEEN(65,90))&RANDBETWEEN(10000,100000)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]B3[/TH]
[TD="align: left"]=LOOKUP(RANDBETWEEN(1,7),{1,2,3,4,5,6,7},{"Andrew","Bartholomew","Christopher","David","Ethan","Francisco","Gregory"})[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]A4[/TH]
[TD="align: left"]=CHAR(RANDBETWEEN(65,90))&RANDBETWEEN(10000,100000)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]B4[/TH]
[TD="align: left"]=LOOKUP(RANDBETWEEN(1,7),{1,2,3,4,5,6,7},{"Andrew","Bartholomew","Christopher","David","Ethan","Francisco","Gregory"})[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]E2[/TH]
[TD="align: left"]=SUM(F2:L2)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]F2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,F1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]G2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,G1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]H2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,H1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]I2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,I1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]J2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,J1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]K2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,K1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]L2[/TH]
[TD="align: left"]=COUNTIFS($B:$B,L1)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]E3[/TH]
[TD="align: left"]=ROUNDUP($D$3*E2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]F3[/TH]
[TD="align: left"]=ROUNDUP($D$3*F2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]G3[/TH]
[TD="align: left"]=ROUNDUP($D$3*G2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]H3[/TH]
[TD="align: left"]=ROUNDUP($D$3*H2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]I3[/TH]
[TD="align: left"]=ROUNDUP($D$3*I2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]J3[/TH]
[TD="align: left"]=ROUNDUP($D$3*J2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]K3[/TH]
[TD="align: left"]=ROUNDUP($D$3*K2,0)[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]L3[/TH]
[TD="align: left"]=ROUNDUP($D$3*L2,0)[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Array Formulas[TABLE="width: 100%"]
<tbody>[TR="bgcolor: #DAE7F5"]
[TH="width: 10"]Cell[/TH]
[TH="align: left"]Formula[/TH]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]F4[/TH]
[TD="align: left"]{=IF(ROW()<=(F$3+3),IFERROR(INDEX($A:$A, SMALL(IF((F$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]G4[/TH]
[TD="align: left"]{=IF(ROW()<=(G$3+3),IFERROR(INDEX($A:$A, SMALL(IF((G$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]H4[/TH]
[TD="align: left"]{=IF(ROW()<=(H$3+3),IFERROR(INDEX($A:$A, SMALL(IF((H$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]I4[/TH]
[TD="align: left"]{=IF(ROW()<=(I$3+3),IFERROR(INDEX($A:$A, SMALL(IF((I$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]J4[/TH]
[TD="align: left"]{=IF(ROW()<=(J$3+3),IFERROR(INDEX($A:$A, SMALL(IF((J$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]K4[/TH]
[TD="align: left"]{=IF(ROW()<=(K$3+3),IFERROR(INDEX($A:$A, SMALL(IF((K$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
[TR]
[TH="width: 10, bgcolor: #DAE7F5"]L4[/TH]
[TD="align: left"]{=IF(ROW()<=(L$3+3),IFERROR(INDEX($A:$A, SMALL(IF((L$1=$B:$B=TRUE),ROW($B:$B)),ROW()-ROW($B$1)-2)),""),"")}[/TD]
[/TR]
</tbody>[/TABLE]
Entered with Ctrl+Shift+Enter. If entered correctly, Excel will surround with curly braces {}.
Note: Do not try and enter the {} manually yourself[/TD]
[/TR]
</tbody>[/TABLE]
 
Last edited:
Upvote 0
In a Module, a VBA method might be like this below. You can then Filter by Column C's value of x or not where x marks the 5% random selection.
Code:
Sub Main()
  Dim f As Range, r5 As Long, u() As Variant, uu As Variant
  Dim a() As Variant, v() As Variant, vv As Variant
  Dim c As Range, col As String
  
  col = "C" 'Column to show an "x" for the 5% random pick
  
  u() = UniqueValues(Range("B2", Range("B" & Rows.Count).End(xlUp)))
  For Each uu In u()
    Set f = FoundRanges(Range("B2", Range("A" & Rows.Count).End(xlUp)), CStr(uu))
    If f Is Nothing Then GoTo NextV
    If f.Cells.Count <= 9 Then GoTo NextV
    '5% count, rounded.  9=0, 10=1, 20=1, 30=2, 40=2 etc.
    r5 = WorksheetFunction.Round(f.Cells.Count * 0.05, 0)
    v() = RndIntPick(1, f.Cells.Count, r5)
    For Each vv In v
      Set c = f(vv)
      Range(col & c.Row).Value2 = "x"
    Next vv
NextV:
  Next uu
End Sub



Function FoundRanges(fRange As Range, fStr As String) As Range
    Dim objFind As Range
    Dim rFound As Range, FirstAddress As String
     
    With fRange
        Set objFind = .Find(What:=fStr, After:=fRange.Cells((fRange.Rows.Count), fRange.Columns.Count), _
        LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, _
        SearchDirection:=xlPrevious, MatchCase:=True)
        If Not objFind Is Nothing Then
            Set rFound = objFind
            FirstAddress = objFind.Address
            Do
                Set objFind = .FindNext(objFind)
                If Not objFind Is Nothing Then Set rFound = Union(objFind, rFound)
            Loop While Not objFind Is Nothing And objFind.Address <> FirstAddress
        End If
    End With
    Set FoundRanges = rFound
End Function


Public Function UniqueValues(theRange As Range) As Variant
  Dim colUniques As New VBA.Collection
  Dim vArr As Variant
  Dim vCell As Variant
  Dim vLcell As Variant
  Dim oRng As Excel.Range
  Dim i As Long
  Dim vUnique As Variant
  Set oRng = Intersect(theRange, theRange.Parent.UsedRange)
  vArr = oRng
  On Error Resume Next
  For Each vCell In vArr
    If vCell <> vLcell Then
        If Len(CStr(vCell)) > 0 Then
             colUniques.Add vCell, CStr(vCell)
        End If
    End If
    vLcell = vCell
  Next vCell
  On Error GoTo 0
  'MsgBox colUniques.Count
  ReDim vUnique(1 To colUniques.Count)
  For i = LBound(vUnique) To UBound(vUnique)
    vUnique(i) = colUniques(i)
  Next i
  UniqueValues = vUnique
End Function

Function RndIntPick(first As Long, last As Long, _
  noPick As Long, Optional bSort As Boolean = False) As Variant
  Dim i As Long, r As Long, temp As Long, k As Long
  ReDim iArr(first To last) As Variant
  Dim a() As Variant
  
  For i = first To last
    iArr(i) = i
  Next i
  
  Randomize
  For i = 1 To noPick
      r = Int(Rnd() * (last - first + 1 - (i - 1))) + (first + (i - 1))
      temp = iArr(r)
      iArr(r) = iArr(first + i - 1)
      iArr(first + i - 1) = temp
  Next i
  
  ReDim Preserve iArr(first To first + noPick - 1)
  ReDim a(1 To noPick)
  For r = 1 To noPick
    a(r) = iArr(LBound(iArr) + r - 1)
  Next r
  
  If bSort = True Then
    RndIntPick = ArrayListSort(a())
    Else
    RndIntPick = a()
  End If
End Function

Function ArrayListSort(sn As Variant, Optional bAscending As Boolean = True)
  With CreateObject("System.Collections.ArrayList")
    Dim cl As Variant
    For Each cl In sn
        .Add cl
    Next
     
    .Sort 'Sort ascendending
    If bAscending = False Then .Reverse 'Sort and then Reverse to sort descending
    ArrayListSort = .toarray()
  End With
End Function
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,716
Messages
6,174,069
Members
452,542
Latest member
Bricklin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top