Multiple regression: duplicate values in ratio type data - problem?

BigOlBuick

New Member
Joined
Dec 22, 2016
Messages
9
Hello, I have a beginner's question about multiple regression, so a stats question really. I've only recently learned the basics of linear regression and I still have the following nagging doubt.

I'd like to analyse some sales data for the purpose of forecasting future performance. My dependent variable (Y) is 'profit/loss', which simply represents a sales figure for individual retail items. This is the variable I would like to forecast. There are certain quantifiable conditions for each attempted sale of an item (item sales rank, duration of item availability, advertising expenditure, etc.) and these are my independent variables. My question stems from the fact that the historical values I have for Y are either a positive number (ranging from 0 to 1000) or a fixed negative value of -100. An item may be sold for any amount of profit but the wholesale price to the seller of each item is the same, hence the same fixed loss amount for any unsold items. A sample of the data for Y might look like this (note the fixed negative value of -100 in a few instances):

23
55
201
-100
13
-100
321
124
57
-100
33

It's my understanding that a multiple regression model here would produce varying negative (and positive) values for Y, and this is not my issue. What I'd like to know is, are there any other implications of using this sort of input in a regression model? Or can it be treated in the same way as any ratio type data? Perhaps it sounds silly but I'm wondering whether the fixed negative values might somehow distort a regression model's output. I'm not trying to replicate the fixed -100 value for the losses, only trying to get to true averages such that I may accurately predict the profitability of an item's listing for sale given the pre-sale conditions (and avoid unprofitable listings). Hope this all makes sense. Thank you.
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
I would start by looking at whether there is a relationship between your dependent variable and the independent variables individually before embarking on multiple regression e.g. if you create a scatter plot of sales vs advertising expenditure can you see a pattern? You can create a regression equation direct from the chart. If you include independent variables that have no relationship with your dependent variable, multiple regression will result in an equation with less predictive power than a simple regression.
If you include the fixed -100 values and then exclude them you will see that they will reduce the amount of variability in the dependent variable that can be explained by variations in the independent variable - the R Squared statistic. You might benefit by analysing your unsold items separately. I would be careful about including items such as rank in a multiple regression - If your best seller is ranked 1 and your next best seller is ranked 2 regressing on rank would imply that you were expecting the rank 2 item to have 1/2 the sales of the rank 1 item.
 
Upvote 0
Ben, thank you very much for your reply.

I would start by looking at whether there is a relationship between your dependent variable and the independent variables individually before embarking on multiple regression e.g. if you create a scatter plot of sales vs advertising expenditure can you see a pattern? You can create a regression equation direct from the chart. If you include independent variables that have no relationship with your dependent variable, multiple regression will result in an equation with less predictive power than a simple regression.

Yes, thanks for the sound advice, I'm pretty sure that each of my independent variables have a significant correlation with Y but I'll check to be sure before running a multiple analysis.

I would be careful about including items such as rank in a multiple regression - If your best seller is ranked 1 and your next best seller is ranked 2 regressing on rank would imply that you were expecting the rank 2 item to have 1/2 the sales of the rank 1 item.

Yes, I'm aware that the distance between the values used must be constant (the ranking data I have is actually 'units sold', which I believe is a suitable indicator of an item's popularity). Again, thank you.

If you include the fixed -100 values and then exclude them you will see that they will reduce the amount of variability in the dependent variable that can be explained by variations in the independent variable - the R Squared statistic. You might benefit by analysing your unsold items separately.

This is what I'm most interested to learn about. The idea of separate analyses hadn't occurred to me but I'm confused about how it would work. As all the negative values are the same, are you suggesting that I try to somehow convert them into a usable, variable form? Also, how might I go about combining the results of these two analyses in order to get to a final regression equation?

One of the main concerns I have here is that, because the -100 value is the only one I have for a negative result, there are many instances where the values for X1, X2, X3, Xi... are similar but where the corresponding values for Y vary greatly.
 
Upvote 0
I think if you analyse the losses separately you should be able to adjust the constant in the regression equation to take them into account.
That way your regression equation will be calculated using the sales data only but when you use it as a predictor it will be taking the unsold items into account as part of the constant.
Hope this makes sense!
 
Upvote 0
That seems to ring true to me but I'm afraid my limited skills in this field prevent me from fleshing it all out! Certainly worth hearing about though, I'll see where I can go with it - cheers.
 
Upvote 0

Forum statistics

Threads
1,223,234
Messages
6,170,891
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top