luckycharm said:
So (as usual) everything begins with Aladin.
Sould've guessed.
Hopefully he reads in on this one and elaborates, since I
cannot say I follow/agree with your explanation.
Anyhow, to me it looks as though the second part (countif)
returns number of occurences for the "non distinct".
Waiting on Aladin.
I did have two Invalid_Session in trying to respond to this one.
Well, the formula Fairwinds posted is due to Harlan Grove, although I did play the role of an evaluator regarding the earlier public versions Harlan posted at the
worksheet.functions.
Harlan eventually succeeded to create a SumProduct version of David Hager's formula, which is:
{=SUM(1/COUNTIF(Range,Range))}
The kernel idea of this formula is:
1/Tokens(Type)
Tha is, if Bob is a distinct name (a type) and Bob occurs say 3 times, we can assign a weight to each of its occurrences (tokens), thus:
1/3, 1/3, 1/3
Summing the individual weights we get 1 back. That is: dividing 1 by the tokens, we get a distinct/unique/type count.
Two things to note about this formula:
If Range has any empty cell, we get #DIV/0! for COUNTIF(EmptyCell,EmptyCell) == 0.
If Range has any cell housing a formula (like =IF(X1,1,"") that returns a blank (sometimes referred to as null string), COUNTIF(CellWithFormulaBlank,CellWithFormulaBlank), we get a count of 1 for such a cell.
It's somewhat theoretical whether an empty cell should be considered a distinct type. The same holds for a formula-blank. Supposing that they are not distinct types, Hager's formula needs some modification before it can be applied to a Range housing empty cells or formula blanks:
{=SUM(IF(LEN(Range),1/COUNTIF(Range,Range))}
or
{=SUM(IF(Range<>"",1/COUNTIF(Range,Range))}
It's obvious that using the latter (with lesser function call) is better (for robustness).
My contribution (along with Juan) consists of just this modification.
How does this formula works using an example?
Let A2:A8 house the following sample:
{"Bob";"Bob";"Bob";"Jane";"Jane";EmptyCell;"Thomas"}
where EmptyCell stands for an empty cell (not for a distinct type).
The COUNTIF(A2:A8,A2:A8) bit gives:
{3;3;3;2;2;
0;1}
The 1/COUNTIF(A2:A8,A2:A8) gives:
{0.333333333333333;0.333333333333333;0.333333333333333;0.5;0.5;#DIV/0!;1}
Note #DIV/0!. Clearly, there are 3 types/distinct/unique items if SUM could ignore the error values.
With
{=SUM(IF(A2:A8<>"",1/COUNTIF(A2:A8,A2:A8)))}
we have successively:
=SUM(IF({TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE},{0.333333333333333;0.333333333333333;0.333333333333333;0.5;0.5;#DIV/0!;1}))
When IF does effect the filtering:
=SUM({0.333333333333333;0.333333333333333;0.333333333333333;0.5;0.5;FALSE;1})
Since SUM ignores logical values, we get: 3.
It is quite easy to turn Hager's original formula
{=SUM(1/COUNTIF(Range,Range))}
into a SumProduct formula:
=SUMPRODUCT(1/COUNTIF(Range,Range))
but it isn't regarding:
{=SUM(IF(Range<>"",1/COUNTIF(Range,Range)))}
Whenever IF is needed when computing with array objects, a control+shift+entered formula is almost always a necessity.
Harlan eventually arrived at:
=SUMPRODUCT((A2:A8<>"")/COUNTIF(A2:A8,A2:A8&""))
which is harder to understand, but becomes intelligible if one knows the following about CountIf (as touched upon at the beginning of this post)...
UitlegHagerGrove.xls |
---|
|
---|
| A | B | C | D |
---|
1 | | | | |
---|
2 | | 0 | | |
---|
3 | | 1 | | |
---|
4 | | 0 | | |
---|
5 | | 1 | | |
---|
6 | | | | |
---|
|
---|
C2:
=COUNTIF(A2,A2)
C5:
=COUNTIF(A5,A5&"")
Using our original example...
The (A2/A8<>"") bit gives:
{TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE}
The COUNTIF(A2:A8,A2:A8&"") bit gives [ see the foregoing about the behavior of CountIf regarding "" ]...
{3;3;3;2;2;1;1}
The (A2:A8<>"")/COUNTIF(A2:A8,A2:A8&"") bit gives:
{TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE}/{3;3;3;2;2;1;1}
The pairwise division (with coercion of logical values into numbers) gives:
{0.333333333333333;0.333333333333333;0.333333333333333;0.5;0.5;0;1}
which gets summed, producing as result: 3.
Concatenating Range with "" recurs also in other forms in other formulas where filtering If can be circumvented like in:
http://www.mrexcel.com/board2/viewtopic.php?t=70140&highlight=left