I’m concerned about an apparent incorrect calculation of the Determination Coefficient r squared using the Excel LINEST Function. I became concerned after reading the article by Michael Middleton entitled “Better Exponential Curve Fitting Using Excel”, available at the website http://www.mikemiddleton.com/Excel-Exponential-Curve-Fit-2010.pdf . Middleton indicates that Excel can report incorrect values of r squared for curve fitting using exponential and power equations. I decided to confirm Middleton’s conclusions by copying the data from his example and performing my own calculations of the r squared value. First, I used the procedure as described in the Help window for the LINEST Function. This procedure calculates r squared as the regression sum of squares divided by the total sum of squares. I then calculated r squared using a very standard equation of 1 minus the ratio of the residual sum of squares to the total sum of squares, which is mathematically equivalent to the equation used in LINEST. Both calculations yielded an r squared value of 0.8018, which agrees with the value reported in the Middleton paper. However, an r squared value of 0.9828 is reported on both the Excel output array for the LINEST Function as well as the TRENDLINE output displayed on the plotted chart. Middleton reported this value for comparison to the calculated values. By the way, to use the linear regression LINEST Function for calculating the r squared for an exponential regression equation, I used the y values transformed to LN by entering the function as LINEST(LN(y range), x range, TRUE, TRUE).
As a further check, I determined the r squared value using the LINEST calculation procedure with the y values transformed to LN, along with the equation coefficients indicated in the LINEST output. The resulting r squared value was 0.9873, which is very close to the LINEST and TRENDLINE values. From this set of calculations, I computed and confirmed the value of 0.4672 for the transformed residual sum of squares indicated in the LINEST output. However, the value of 36.2790 I calculated for the transformed regression sum of squares is considerably different from the value of 26.7447 indicated in the LINEST output array.
I’m aware that Excel, at least in some situations, does not actually use the method of least squares for regression curve fitting. The modified procedure is described at the Microsoft Knowledge Base website, https://support.microsoft.com/en-us/kb/828533, in the technical paper KB828533, "Description of the LINEST function in Excel 2003 and in later versions of Excel". However, I wouldn’t think that the modified procedure for calculating the regression constants should affect the calculation of r-squared. Also, the Middleton example does not fall into the well-documented category of incorrect r squared values reported by Excel when the intercept is set to zero.
I’m certainly curious if anyone has some inside knowledge on exactly how Excel actually calculates the r squared value for logarithmic, exponential, and power regression equations. At the moment, I’m leery of using the LINEST and TRENDLINE values of r squared without actually calculating the value using the basic definition equation.
As a further check, I determined the r squared value using the LINEST calculation procedure with the y values transformed to LN, along with the equation coefficients indicated in the LINEST output. The resulting r squared value was 0.9873, which is very close to the LINEST and TRENDLINE values. From this set of calculations, I computed and confirmed the value of 0.4672 for the transformed residual sum of squares indicated in the LINEST output. However, the value of 36.2790 I calculated for the transformed regression sum of squares is considerably different from the value of 26.7447 indicated in the LINEST output array.
I’m aware that Excel, at least in some situations, does not actually use the method of least squares for regression curve fitting. The modified procedure is described at the Microsoft Knowledge Base website, https://support.microsoft.com/en-us/kb/828533, in the technical paper KB828533, "Description of the LINEST function in Excel 2003 and in later versions of Excel". However, I wouldn’t think that the modified procedure for calculating the regression constants should affect the calculation of r-squared. Also, the Middleton example does not fall into the well-documented category of incorrect r squared values reported by Excel when the intercept is set to zero.
I’m certainly curious if anyone has some inside knowledge on exactly how Excel actually calculates the r squared value for logarithmic, exponential, and power regression equations. At the moment, I’m leery of using the LINEST and TRENDLINE values of r squared without actually calculating the value using the basic definition equation.