Selecting 3% of random numbers

srdavisgb

Board Regular
Joined
Nov 5, 2011
Messages
51
I am attempting to build an employee attrition model to better understand the impact on an organization. In my worksheet are the following data: column A: employee name, column B: job title, column C: random number and column D: identification of the attrited employees. In column C, I have added the formula =randbetween(1,100). There are 200 employees in the file. In cell D2, I have added the following formula =if(C2<=6,"Attrited", "On Roll").

I was hoping the randbetween function would include all numbers between 1 and 100 twice to capture the six employees I could expect to leave the organization. However, I discovered the randbetween function doesn't use all the numbers between 1 and 100. In one case, I found the lowest number was 4 and it was used 4 times. Needless to say, I'm not able to utilize the random numbers to ensure I have a random selection of employees identified for attrition. I plan to repeat the process in columns E, F, G and H to see the impact of a 3% attrition rate over a five year period. I also plan to write a macro with a loop to repeat the 5-year attrition process 20 times (e.g., monte carlo simulation) to calculate the overall average impact in payroll as well as the loss of jobs.

Three questions:

1) Is this the right approach for the modeling attrition?
2) If yes, how can I preserve the randomness of job selection (in column C) along with only identifying 3% of the population?
3) Is there a way to create this model using scenario manager? Again, how do you only select 3% of the population for exclusion in scenario manager.

Thank you in advance for your assistance.
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
I have a feeling that your random selection might be ignoring underlying factors that result in attrition. If you were dealing with a large group of staff who have much the same job, then my concern might be invalid.

To get a random 3% of the population generate random numbers 1 to 100,000 and take the lowest 3%. It is unlikely that you will get multiple with the same result (like you had 4 instances of 4). You need to document your intention to take the lowest 3%, to avoid any suggestion that you avoided taking the highest ( or the mid range) 3% but chose the lowest because of some selection bias.
 
Upvote 0
I discovered the randbetween function doesn't use all the numbers between 1 and 100.

Sure it does. But you need to understand the nature of randomness. Each time you call RANDBETWEEN, there is a 1/100 chance that the result is 4, for example. Consequently, there can be duplicates in a set of 200 calls. So there is no guarantee that you will see all possible numbers.

There are ways to ensure that you see all numbers 1 through 100 exactly twice. But that is not necessary for your application.

I am attempting to build an employee attrition model to better understand the impact on an organization. In my worksheet are the following data: column A: employee name, column B: job title, column C: random number and column D: identification of the attrited employees. In column C, I have added the formula =randbetween(1,100). There are 200 employees in the file. In cell D2, I have added the following formula =if(C2<=6,"Attrited", "On Roll").

If you want exacty 3% (6 of 200) to be attrited, one approach is to enter =RAND() into column C, then sort your data based on column C, and assign "Attrited" to the first 3% (6).

Note: The RAND column will change when you sort unless you copy-and-paste-special-value to overwrite the RAND column. There is no harm except a degradation of performance, which might or might not be noticeable.

If you want to a simulate a probability of 3% (sometimes more or less), enter =RAND() into column C, and enter the following formula into column D: =IF(C2<=3%,"Attrited","On Roll").

Note: We really don't need column C. The formula in column D could be: =IF(RAND()<=3%,"Attrited","On Roll"). But putting the random numbers into column C allows for using copy-and-paste-special-value to "freeze" the random numbers, at least for testing purposes.

I also plan to write a macro with a loop to repeat the 5-year attrition process 20 times (e.g., monte carlo simulation) to calculate the overall average impact in payroll as well as the loss of jobs.

In that case, I would use the "probability of 3%" approach.
 
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top