Hi,
Does anyone know how to get Excel to pull a stratified random sample?
For example, let's say my dataset contains records/measurements from sixteen groups of buffalo, representing nine entirely different populations across the US. The data is organised into 4 columns ("Population", "Family size", "Adult weight", & "Density") and 16 rows.
Now, what I'd want to do is to generate a subsample by randomly pulling ONE record per population (=9). Then I'd want to repeat that 99 times to get 100 randomly chosen subsamples, each containing just 9 (not 16!) records.
Now I suppose I could use labels to identify the various populations but I worry that it'd get really complicated once I apply the macro to my real (& much more complex) dataset, especially since I'd be wanting each subsample to contain data from ALL the columns (i.e. each of the 9 records per subsample would still include "Fam Size", Weight" & "Density" estimates).
Understand what I'm needing to do? Unfortunately, the usual answers (simple random sampling between X & Y, or using a random number generator) won't work b/c I need everything to be stratified by population. I'd also prefer to avoid having to run random sampling functions/macros for EACH population individually.
I know this can be done via S+ & R, but I'd really rather stick to excel if at all possible.
Any suggestions you can give will be enthusiastically & gratefully appreciated!! ray:
Thanks!!
Does anyone know how to get Excel to pull a stratified random sample?
For example, let's say my dataset contains records/measurements from sixteen groups of buffalo, representing nine entirely different populations across the US. The data is organised into 4 columns ("Population", "Family size", "Adult weight", & "Density") and 16 rows.
Now, what I'd want to do is to generate a subsample by randomly pulling ONE record per population (=9). Then I'd want to repeat that 99 times to get 100 randomly chosen subsamples, each containing just 9 (not 16!) records.
Now I suppose I could use labels to identify the various populations but I worry that it'd get really complicated once I apply the macro to my real (& much more complex) dataset, especially since I'd be wanting each subsample to contain data from ALL the columns (i.e. each of the 9 records per subsample would still include "Fam Size", Weight" & "Density" estimates).
Understand what I'm needing to do? Unfortunately, the usual answers (simple random sampling between X & Y, or using a random number generator) won't work b/c I need everything to be stratified by population. I'd also prefer to avoid having to run random sampling functions/macros for EACH population individually.
I know this can be done via S+ & R, but I'd really rather stick to excel if at all possible.
Any suggestions you can give will be enthusiastically & gratefully appreciated!! ray:
Thanks!!