How to Create a Trendline for Data Points of Different Weights?

dmmx3

New Member
Joined
Mar 2, 2012
Messages
9
I have a scatter plot of X and Y data points. Each point is made up of some number of items. How can I weight the points so that the trendline doesn't count a point that is made up of 10 items the same as it counts a point that is made up of 100 items?

An example of my data:

X .... Y .... (Number of items making up the data point)
1 .... .459 .... (45)
2 .... .509 .... (118)
3 .... .551 .... (231)
4 .... .589 .... (218)
5 .... .609 .... (124)
6 .... .698 .... (54)

Thanks in advance for any advice.

(Also, I should note I played with doing this manually by simply copying each data point the number of times in the parentheses, and then using the much larger data set. I think this works, but I have far too much data to do this with everything. I assume there is an easier way?)
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
After some more trial and error, and googling, I think what I want to do is run a logistic regression.

Anyone have any tips on doing this in Excel? I'm not sure Excel can do this?

I have variables Y1, Y2, and X (binary).

Any help would be greatly appreciated.
 
Upvote 0
So what does your data look like now? Or do you have it all sorted out?
 
Upvote 0
So what does your data look like now? Or do you have it all sorted out?

Thanks for your interest. I do not have this figured out. I can manipulate the data various ways. Let me describe what I am trying to do - maybe that will make this clearer.

I have about 16000 data points. Each point represents one college basketball game. Each point has the following information:

1) Spread (The line determined by the sportsbook)
2) Total (The total score determined by the sportsbook)
3) Win/Loss (Did the home team win?)

So, a sample of data would look like this:


-3 ... 143 ... 1
-1 ... 127 ... 1
+3 ... 136 ... 1
-4 ... 149 ... 0

in order, those 4 data points represent the following information:

home team favored by 3, predicted total 143, home team wins
home team favored by 1, predicted total 127, home team wins
home team underdog by 3, predicted total 136, home team wins
home team favored by 4, predicted total 149, home team loses

As I said, I have about 16K of these points. I want to run a logistic regression to determine an equation that would give me an expected winning percentage for the home team, given the inputs of the spread and the total.

Again, thanks for your interest, and thanks in advance for any help.

(By the way, I can do this in Excel using a logarithmic or polynomial regression, but I need to do a logistic one.)
 
Upvote 0
Solver does that a lot! It can also claim to find solutions which violate constraints so you need to examine the solution.

I suspect you have way too much data for the cheap solver that you get free with Excel. If you don't have access to a better solver maybe you could try it with a random sample of your 16,000 points? A thousand should be plenty for this so long as they were randomly selected from your population but the solver shipped with Excel may be challenged even by this as it is really only a toy/marketing tool to get you to spend a lot of money (a thousand dollars minimum) on the real thing. There is a free industrial-grade MIP solver called LPSolve which has an Excel/VBA implementation which seems to be quite good (I have used it a bit) but to use it you have to write your own LP formulation (which Solver does for you behind the scenes) which is a bit of a pain.

Alternatively you could download free trial versions of the "grown up" solvers from Frontline (WWW.solver.com) if you only want to do this once.
 
Upvote 0
So:

F(z) = 1 / (1 + exp(-z))

z = Beta0 + Beta1 * Spread + Beta2 * Total

Use Solver to minimize RMS error (E7) by changing Beta0, Beta1, and Beta2, with a constraint that Beta0 + Beta1 * Spread + Beta2 * Total < 700 to prevent overflow.

Code:
       --B--- --C-- --D-- --E--- -----------------------------F------------------------------
   2                Beta0  1.007 via Solver                                                  
   3                Beta1 65.885 via Solver                                                  
   4                Beta2  0.694 via Solver                                                  
   5                                                                                         
   6                      Error                                                              
   7                       0.000 E7: {=SQRT(AVERAGE((D9:D27-E9:E27)^2))}                     
   8   Spread Total  W/L   Fit                                                               
   9        3   115     1  1.000 E9 and down: =1 / (1 + 1 / EXP(Beta0 + Beta1*B9 + Beta2*C9))
  10        0   110     1  1.000                                                             
  11       -2   145     0  0.000                                                             
  12       -1   134     1  1.000                                                             
  13        3   146     1  1.000                                                             
  14       -4   139     0  0.000                                                             
  15        1   135     1  1.000                                                             
  16        2   127     1  1.000                                                             
  17       -4   150     0  0.000                                                             
  18        0   135     1  1.000                                                             
  19       -1   148     1  1.000                                                             
  20       -1   144     1  1.000                                                             
  21       -2   133     0  0.000                                                             
  22        3   121     1  1.000                                                             
  23        3   143     1  1.000                                                             
  24        3   134     1  1.000                                                             
  25       -1   142     1  1.000                                                             
  26       -4   144     0  0.000                                                             
  27        2   133     1  1.000
 
Upvote 0

Forum statistics

Threads
1,223,227
Messages
6,170,849
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top