how to run regression on data including non-numeric data?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Guest
Input range contains non-numeric data such as "#N/A". How can I run regression without deleting these non-numeric data one by one?
 
> Just did a small check: TREND produces the same seq of vals on a series that is scrambled.

That's not my finding. If after sorting the #N/A errors to bottom I substitute RAND() for ROW()+A1*0 I get different TREND() results.

JUST to make sure: My small checking was with a fully numeric series (no #N/A's).

>Why not substitute the mean of the series for the #N/A's

It's not the same. Try it! Bottom line: the order and magnitude of these values affect the regression line.

WELL, I didn't claim the substitution would not affect the regression line. Of course it will. The question is whether it be sound to make substitutions (missing value --> subst mean of the series governed by the 5% rule). I believe it is not unsound and statistically defensible.

IF interpreting your #N/A's as missing values is sound and you can live the 5% rule, I'd suggest using this missing value transformation.

Aladin
 
Upvote 0

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
On 2002-02-25 17:02, Mark W. wrote:
That was me above. My logon must have timed out...

I also wanted to point out that AVERAGE(A1:A5) where A1:A5 contains an #N/A error will produce #N/A.

This must be your first one. I already got a few anonymous postings... New trouble.

Who needs AVERAGE :wink:

Lets say that we have your example sample

{10;20;30;#N/A;40}

I create in B1:B5 by using (20% missing vals!)

=IF(ISNUMBER(A1),A1,SUMIF($A$1:$A$5,"<>#N/A")/COUNTIF($A$1:$A$5,"<>#N/A"))

so I get in B:

{10;20;30;25;40}

Applying

{=TREND(B1:B5)}

in C1:C5, I get:

{12;18.5;25;31.5;38}

This doesn't look too bad!

PS. This thread will be confusing to others, I'm afraid.
 
Upvote 0
I'm afraid we'll have to agree to disagree on this one. I stand by my original recommendation:

You could sort the #N/A errors to the bottom of your list and have the TREND() function ignore them. For example, suppose that A1:A5 contained {10;30;20;#N/A;40}. {=TREND(A1:A5)} entered into D1:D5 produces #VALUE! errors. Instead, enter the array formula...

{=TREND(A1:OFFSET(A1,SUM(ISNUMBER(A1:A5)+0)-1,))}

...into D1:D5. Next, enter the formula, =ROW()+A1*0, into B1 and fill down to B5. Now, sort on column B. Voila!! The #N/A errors are sorted to the bottom of the list and ignored by TREND().
This message was edited by Mark W. on 2002-02-26 06:33
 
Upvote 0

Forum statistics

Threads
1,223,317
Messages
6,171,419
Members
452,402
Latest member
siduslevis

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top