Build a Model to Predict Sales Based on Multiple Regression
May 20, 2022 - by Bill Jelen
Problem: I run a Gelato stand. After 10 days of sales, I discovered that each day, I would either make a lot of money or nearly go broke. As I analyzed sales, I began to feel that temperature and rain might be two important determining factors in how much money I make. On rainy or cool days, fewer people buy gelato.
I set up the table below, which shows each day’s sales, temperature, and whether it rained.
Based on the data I’ve collected, how can I determine the relationship between sales, temperature, and rainfall?
Strategy: You need to do a multiple regression. After a multiple regression, you will have a formula that predicts sales like this:
Y = m1x1 + m2x2 + b
Sales = Temperature x M1 + Rain x M2 + b
The LINEST
function can return the values M1, M2, and b that best describe your sales model. Here’s what you do:
1.
LINEST
is going to return three values, so select a range of three cells that are side by side. The first argument is the range of known sales figures. The second argument is the range of temperatures and rainfall.2. Press Ctrl+Shift+Enter to calculate the array formula.
3. Enter a prediction formula in column D to see how well the regression calculation describes sales. The results are so-so. The prediction in D6 is right on the mark. The predictions in D11 and D12 are off by $20 each—an error of 10%.
4. To get the additional statistics that
LINEST
can return to show how well the results match reality, add a fourth argument:TRUE
. Be sure to enter the function in a five-row range.
5. Press
Ctrl+Shift+Enter
. You will get the results shown here.
I only somewhat paid attention in statistics class, but I know that a key statistical indicator is the R-squared value. It ranges from 0 to 1, where 1 is a perfect match, and 0 is a horrible match. The 0.88 value here confirms that the prediction model is pretty good but not perfect.
Additional Details: Regression models try to force actual results into a straight-line formula. The fact is that life may not fit in a straight-line formula. Because I created the spreadsheet used here, I know that the actual data in the gelato model uses the formula (Temperature - 50) x $2 if raining and (Temperature - 50) x $6 if not raining. In this example, Sally was correct that ice cream sales are dependent on rain and temperature, but even a powerful regression engine could not predict the absolutely correct formula.
Alternate Strategy: The Analysis ToolPak still offers tools to do Regression, as well as testing correlation, exponential smoothing, create histograms, generate random numbers, create samples, and more. You have to enable the add-in first. Type Alt+T followed by I. Add a checkmark next to Analysis ToolPak and click OK.
You will now have a Data Analysis icon on the right side of the Data ribbon tab. Click the icon and Excel offers a list of tools. Although some of these tools offer older dialog boxes that really need updating, they can often produce far more detailed results. The Regression tool creates charts of the residuals, Anova analysis, and tables of statistics about the regression.
Gotcha: The results from the ToolPak are not live formulas! They are a one-time snapshot. If you change the underlying data, you will have to run the analysis tool again.
This article is an excerpt from Power Excel With MrExcel