# Text to Number Conversion - Which functions should be used?



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## Scott Huish (May 17, 2013)

> If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
> because SUMPRODUCT should also be avoided and there are other alternatives
> (this is again another issue I can talk about if you wish)



Without using any of the new functions in XL2007 or higher or VBA, please show a sample solution to this problem:

I want to sum the values in Column F, where Column A equals Yes and Column D equals NO.

You are also making an assumption that coercion is only needed when values are "wrongly formatted"


----------



## aaalviny (May 17, 2013)

Scott Huish said:


> Without using any of the new functions in XL2007 or higher or VBA, please show a sample solution to this problem:
> 
> I want to sum the values in Column F, where Column A equals Yes and Column D equals NO


Do you mean to sum values in Column F if Column A is "Yes" and Column D is "No"?

I mean if there are large amount of coercion needed, it is probably due to the wrongly formatted raw data


----------



## Scott Huish (May 17, 2013)

Yes


----------



## aaalviny (May 17, 2013)




----------



## Scott Huish (May 17, 2013)

Why would you use 2 SUMPRODUCT? You only need 1. You have used 41 formulas to avoid 1.


----------



## Scott Huish (May 17, 2013)

You have also used 13 regular math functions which you claim are volatile, which contradicts your statement that these formulas should be avoided.


----------



## aaalviny (May 17, 2013)

Scott Huish said:


> Why would you use 2 SUMPRODUCT? You only need 1. You have used 41 formulas to avoid 1.


I am saying If there are another request to sum those Not "Yes" or Not "No"
then there will be totally two sumproduct
that 41 formulas don't matter, as long as they are easy to read and maintain
(of course I can combine Column C and D to column B, to reduce the number of formulas required, but

imagine, when there are now five values of interest:
1. sum of col F if "Yes" and "No"
2. sum of col F if Not "Yes" or Not "No"
3. sum of col F if "Yes"
4. sum of col F if "No"
5. sum of col F if Not "Yes" and Not "No"

If now, the data changes, "Yes" is no longer "Yes" and becomes "Y"
if you are using sumproduct, you have to change five times in the formula (one in each sumproduct)
but if my suggestion is used, only one-time change is needed then formulas are copied down

this is because of the DRY (Don't repeat yourself) principle is violated in the sumproduct cells
the "Yes" column or "No" column are needed to be thoroughly checked for each sumproduct cells, which are repeated
and that's why it's slow because excel needs to check the same array for multiple times

this is about the future maintenance of the worksheet (future is always unpredictable)


Scott Huish said:


> You have also used 13 regular math functions which you claim are volatile, which contradicts your statement that these formulas should be avoided.


I am sorry but what are the volatile functions you are referring to?


----------



## Macropod (May 17, 2013)

aaalviny said:


> that 41 formulas don't matter, as long as they are easy to read and maintain


What a load of crap!!!

You clutter the worksheet with loads of completely unnecessary formulae that increase:
 (a) maintenance overheads;
 (b) the risk of maintenance error;
 (c) file size; and
 (d) time required for a recalc by adding an extra output that wasn't asked for,
then have the gall to suggest it doesn't matter??? What planet are you from???


----------



## Jon von der Heyden (May 18, 2013)

Well I said I wouldn't debate this; but for the sake of the readers:



			
				aaalviny said:
			
		

> 1. -- or INT() or VALUE() matches the aim of the formula?


Indeed. Well actually, VALUE() does. INT() is intended to truncate a decimal value to a whole number. But I suggest that that a meaningless mathematical operation using an arithmetic operator matches the aim of the formula too. We know that, in Excel, whenever an expression includes an arithmetic operator (--;+;-;*;/;%;^) Excel attempts to resolve each operand as a numeric value (i.e. coercion is invoked). Therefore it is by design and it "matches the aim of the formula". Understanding this holds no further complexity of understanding the purpose of the VALUE() function.



			
				aaalviny said:
			
		

> 2. -- is simpler than VALUE()


I would say on par. Any user unaware of coercion has to learn it. Learning what VALUE() does is no simpler nor more complex than understanding that use of arithmetic operator also invokes a coercion attempt.  Besides it's not rocket science and it's really easy to read up on and to understand. Yes we occasionally see people ask what is the purpose of -- but I believe what is being asked is the context (after all everyone knows --1=1).  People ask why it is necessary to invoke coercion.  If we start substituting -- for VALUE, I believe we will have equally as many questions asking why it is necessary.




			
				aaalviny said:
			
		

> 3. Convention is to use --?


Convention is important. It ensures consistency in how we work. It means that when you evaluate a complex formula and see -- you know immediately a deliberate attempt to coerce a text value to a numeric value.  Indeed there are some conventions that aren't as good as they should be.  This isn't one of them!



			
				aaalviny said:
			
		

> 4. VALUE() requires one more parenthesis than --


Yes it's a trivial point but most of our conventions are trivial. It is exactly all these trivial points though that when adhered to in a complex model makes all the difference. I for one don't want the extra parenthesis even if only for formula evaluation. I'm a big fan of analysing formulae with F9 over certain portions. Extra parenthesis only makes that process that little more tricky. But yes, this is a trivial point.



			
				aaalviny said:
			
		

> 5. -- is faster than VALUE?


Oh yes it is! The double negation operation invokes VERY little overhead, while the VALUE() function call by comparison to -- involves considerably more work. Is it noticeable?  Not unless you scale it up.  But unnecessary function calls is a significant culprit behind slow calcing models.  Understanding coercion and boolean logic and avoiding unnecessary function calls can make a noticeable difference in the calc speed of your workbooks.  What we are talking about here is micro-optimization; but micro-optimization is indeed very important. Use of all the micro-optimization best practises is what makes the difference between a 10MB file and a 1MB file. I once inherited a spreadsheet that was over 20MB.  By the time I removed some duplicate calculations, substituted linear lookups for binary lookups, applied boolean logic using digital values I reduced the file size down below 1MB.  All the improvements were micro-optimization techniques. So use of -- (or any arithmetic operator) over a function is important.



			
				aaalviny said:
			
		

> 6. -- is shorter in terms of length than VALUE()?


Well I suppose. But that's not our motivation to use it. Not sure where you picked that argument up from Alvin.



			
				aaalviny said:
			
		

> *Conclusion*
> *As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice*



Please don't! You will be breaking away from convention and you will add more overhead to your formulae. This is a very bad suggestion. I agree that there are some stupid conventions. This is not one of them!

Interestingly one may ask why -- is the preferred (or most common) method? I believe it is because double negation comes 1st in {arithmetic} order precedence relations. I recall reading a suggestion how this is a teeny tiny bit quicker than using an operator lower down the list of order precedence; but I cannot remember the logic to that claim.  Still, everyone seems to use --, it strikes me as being a very well-considered convention so I'm going to stick with it. Be good to yourself and do the same.


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## Jon von der Heyden (May 18, 2013)

Further to the above the following table illustrates errors in using VALUE(), INT() and N() for coercion:


Excel 2010ABCDEF1ValueType--VALUE()INT()N()21string111031numeric11114-1numeric-1-1-1-15-1string-1-1-106TRUEboolean1#VALUE!117FALSEboolean0#VALUE!00806:14:23string0.2599884260.25998842600906:14:23numeric0.2599884260.25998842600.26Sheet1Cell FormulasRangeFormulaC2=--A2

VALUE() will not coerce boolean values to numeric values.
INT() will not yield the correct decimal value for time.
N() will not coerce time values captured as string.

Conclusion is that there are no functions (that I am aware of) that will work consistently and reliably as using good ol' plain arithmetic!


----------



## aaalviny (May 18, 2013)

Macropod said:


> What a load of crap!!!
> 
> You clutter the worksheet with loads of completely unnecessary formulae that increase:
> (a) maintenance overheads;
> ...


It is easier to maintain, those formulae are easy to read and take instant time to understand, right?
maintenance is not only to work on the existing task, but also to prepare for future changes of objectives and requirement. Therefore if boss asks for more info in the future, what will happen? Do the user find it easy to maintain and update?
that's why I illustrate adding an extra output, and it's very important in the changing world

I once come across a spreadsheet with 100 long sum product formulae, and when I hope to edit the things, it's just very hard and the calculation speed is very slow. And, of course, I eventually redesigned it. (And can you imagine, for that spreadsheet, at the beginning there are only a few sum product formulae, it's the change to the objectives and requirements that lead to a complicated spreadsheet which is hard to maintain.

I would say the maintenance risk is lower than sumproduct (at least it hides the intermediate steps of the calculation)
for file size, yes it's a concern, but to weigh the file size and maintenance effort, I would choose maintenance effort, it's more important to get things done with accuracy and efficiency, as well as others finding it easy to read.


----------



## Jon von der Heyden (May 18, 2013)

As an amendment to the initial table I posted:


Excel 2010ABCDEFG1valuedata type--VALUE()INT()N()Remark2-1string-1-1-10N() fails to coerce value to the correct
numerical value3TRUEboolean1#VALUE!11VALUE() fails coercion4FALSEboolean0#VALUE!00VALUE() fails coercion501-May-2013string4139541395413950N() fails to coerce value to the correct
numerical value606:04:59string0.2534610.25346100N() and INT() fails to coerce value to
the correct numerical valueSheet1Cell FormulasRangeFormulaC2=--A2C3=--A3C4=--A4C5=--A5C6=--A6D2=VALUE(A2)D3=VALUE(A3)D4=VALUE(A4)D5=VALUE(A5)D6=VALUE(A6)E2=INT(A2)E3=INT(A3)E4=INT(A4)E5=INT(A5)E6=INT(A6)F2=N(A2)F3=N(A3)F4=N(A4)F5=N(A5)F6=N(A6)A2=TEXT(-1,0)A3=TRUEA4=FALSEA5="01-May-2013"A6="06:04:59"


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> Further to the above the following table illustrates errors in using VALUE(), INT() and N() for coercion:
> 
> *Excel 2010*
> ABCDEF1ValueType--VALUE()INT()N()21string111031numeric11114-1numeric-1-1-1-15-1string-1-1-106TRUEboolean1#VALUE!117FALSEboolean0#VALUE!00806:14:23string0.2599884260.25998842600906:14:23numeric0.2599884260.25998842600.26
> ...


just a reply to this post first

when you choose which formulas to use, we have to be clear what the input, output and the aim is.
when designing, it's important to work on the desired process but not have any extra process embedded.
eg when in Vba there is a known error and you want to use resume next, you should not simply put it there to apply to other undiscovered errors.

1. About input structure, for a consistent data structure, only 1 aim is required, I.e. only converting strings to numbers, or converting Boolean to numbers.
if you are using the seemingly reliable --, you cannot discover any potentials about the data formats of raw data, probably there may be 1 wrong data in the whole database. That's to recall the principle of doing only what is needed but not extra.
To demonstrate, imagine there is one Boolean data out of 10000 numbers formatted as text.
if -- is used, the error cannot be found. This is not what reliable means.

And if the data structure consists of all the things you listed above, it is clearly not a database of qualified data, isn't it?

again, do what it is supposed, there are already different functions as you listed above to cater for different needs
Arithmetic negative of a Boolean variable has no meaning. You found the meaning only because excel' special design. It is not a valid logic to have arithmetic negative of a Boolean in human sense, right?


----------



## Jon von der Heyden (May 18, 2013)

aaalviny said:


>



The SUMIF may be faster, but the addition of the helper columns will invariably slow the model.  I'm a big fan of using helper columns to make complex formulae easier to understand; and yes sometimes they do add performance benefits.  This however is not an example of this.

Firstly, A3 should state "Total (Not Yes and Not No)".  Or logic is somewhat different.
Secondly, if I were to use helper columns I would avoid 26 unnecessary IF function calls and rather use e.g. in D6: =--(H6="no")
Finally; my preference to this problem would be two formulae, chosen bother for performance and also because I find most people would find it alot easier to get to grips with than reviewing a series of helper columns:
   In B2: =SUMPRODUCT(--(E6:E18="yes"),--(H6:H18="no"),J6:J18)
   In B3: =SUM(J6:J18)-B2


----------



## Jon von der Heyden (May 18, 2013)

aaalviny said:


> It is not a valid logic to have arithmetic negative of a Boolean in human sense, right?


I found good use of doing exactly this before, yes!

Alvin I have provided good reason to favour to -- over a function, both addressing each point you made in your initial post.  Further I have shown instances where the functions do not provide the correct coerced value.  I favour a single method that works reliably across all data types.  If I need to discriminate then I may deviate or just test the type.  Alvin, regardless of what I or anybody says you are going to argue back, with each point you make stepping one step closer to absurd!  Let's allow the reader to decide now.


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> Well I said I wouldn't debate this; but for the sake of the readers:
> 
> 
> Indeed. Well actually, VALUE() does. INT() is intended to truncate a decimal value to a whole number. But I suggest that that a meaningless mathematical operation using an arithmetic operator matches the aim of the formula too. We know that, in Excel, whenever an expression includes an arithmetic operator (--;+;-;*;/;%;^) Excel attempts to resolve each operand as a numeric value (i.e. coercion is invoked). Therefore it is by design and it "matches the aim of the formula". Understanding this holds no further complexity of understanding the purpose of the VALUE() function.



Glad to see the discussion, I hope to learn something as I am not an encyclopedia

As mentioned above, arithmetic negation to a Boolean does not have a meaning, it's by excels design to make it possible yet it is out of logic. If users continue to adapt to the wrong logic and find out their conventions, its correct if you are involved to excel's design and think its correct. But staying outside from it, this logic is just nonhuman. And given the wrong logic, the aim is clearlynot reached.
recall the if A is false, the there is no meaning to discuss whether B is true or false.
   note : negation of Boolean true is false, as not(true)




Jon von der Heyden said:


> I would say on par. Any user unaware of coercion has to learn it. Learning what VALUE() does is no simpler nor more complex than understanding that use of arithmetic operator also invokes a coercion attempt.  Besides it's not rocket science and it's really easy to read up on and to understand. Yes we occasionally see people ask what is the purpose of -- but I believe what is being asked is the context (after all everyone knows --1=1).  People ask why it is necessary to invoke coercion.  If we start substituting -- for VALUE, I believe we will have equally as many questions asking why it is necessary.



I have learnt it due to excels design. To any human, seeing --"3" does not give the meaning to covert formats, right?. It only makes sense when you are involved to the whole excel environment (please stay out of the excel world to think about the question I would say)
I don't believe many people will ask what value means, because the function name itself is already giving a clue to the meaning of the functions, and its easy to search.
yet -- gives no clueto the meaning. It only does twice arithmetic negation.

well it seems not to be an objective discussion to the point whether people can understand.
but from my experience, I remember I had to spend a second to figure out what it means, as -- doesn't give any clues to the underlying meaning





Jon von der Heyden said:


> Convention is important. It ensures consistency in how we work. It means that when you evaluate a complex formula and see -- you know immediately a deliberate attempt to coerce a text value to a numeric value.  Indeed there are some conventions that aren't as good as they should be.  This isn't one of them!
> 
> 
> Yes it's a trivial point but most of our conventions are trivial. It is exactly all these trivial points though that when adhered to in a complex model makes all the difference. I for one don't want the extra parenthesis even if only for formula evaluation. I'm a big fan of analysing formulae with F9 over certain portions. Extra parenthesis only makes that process that little more tricky. But yes, this is a trivial point.



To the above two, as mentioned in my first article, I don't have much to supplement. Extra parenthesis is not a problem given it is a well designed spreadsheet.
its easy to understand value as converting texts to numbers but thoughts are always required to think about the use of --, does it mean the true arithmetic negation, or the conversion of formats?




Jon von der Heyden said:


> Oh yes it is! The double negation operation invokes VERY little overhead, while the VALUE() function call by comparison to -- involves considerably more work. Is it noticeable?  Not unless you scale it up.  But unnecessary function calls is a significant culprit behind slow calcing models.  Understanding coercion and boolean logic and avoiding unnecessary function calls can make a noticeable difference in the calc speed of your workbooks.  What we are talking about here is micro-optimization; but micro-optimization is indeed very important. Use of all the micro-optimization best practises is what makes the difference between a 10MB file and a 1MB file. I once inherited a spreadsheet that was over 20MB.  By the time I removed some duplicate calculations, substituted linear lookups for binary lookups, applied boolean logic using digital values I reduced the file size down below 1MB.  All the improvements were micro-optimization techniques. So use of -- (or any arithmetic operator) over a function is important.



Well in my environment I am always dealing with models of over 80 MB, and I have tried to deal with those over 200MB
its required, to the complexity of the finance environment and the uncertainty of future,also with the different basis of projection and sensitivity testing.

When I was not aware of the importance of design, the 200 MB spreadsheet takes me possibly 30mins to have a full calculation and I always had to use manual calc mode.
after a year when I reworked, I used auto calculation mode to deal with it with no volatile formulas and using binary lookups and no DRY violation.
it just takes little time for the calculation.
(this 200MB thing is about data processing, but not the same type of thing for the 80MB model. 80MB things are all about modeling, not because of lots of same type of data as database)




Jon von der Heyden said:


> Well I suppose. But that's not our motivation to use it. Not sure where you picked that argument up from Alvin.



Just in case anyone cares about the readability of longer characters 





Jon von der Heyden said:


> Please don't! You will be breaking away from convention and you will add more overhead to your formulae. This is a very bad suggestion. I agree that there are some stupid conventions. This is not one of them!
> 
> Interestingly one may ask why -- is the preferred (or most common) method? I believe it is because double negation comes 1st in {arithmetic} order precedence relations. I recall reading a suggestion how this is a teeny tiny bit quicker than using an operator lower down the list of order precedence; but I cannot remember the logic to that claim.  Still, everyone seems to use --, it strikes me as being a very well-considered convention so I'm going to stick with it. Be good to yourself and do the same.


The order of precedence is designed with basis
 By taking it to a higher place, do you mean you have to put other things in that formula to a lower priority? All in all they are all to be calculated, I don't feel there should be any special treatments to the order of precedence?

And yes I nearly forgot about N() in the discussion, thanks for the reminder


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> The SUMIF may be faster, but the addition of the helper columns will invariably slow the model.  I'm a big fan of using helper columns to make complex formulae easier to understand; and yes sometimes they do add performance benefits.  This however is not an example of this.
> 
> Firstly, A3 should state "Total (Not Yes and Not No)".  Or logic is somewhat different.
> Secondly, if I were to use helper columns I would avoid 26 unnecessary IF function calls and rather use e.g. in D6: =--(H6="no")
> ...


In B2: =SUMPRODUCT(--(E6:E18="yes"),--(H6:H18="no"),J6:J18)
To me it's already a complicated formula, seems the level of simple differs among us 
What will happen if there are more conditions to add?
and if more values of interest where simple subtraction does not work?
and then how about when you have to change all "yes" to "y" in all sum products formulas?
and it is "not yes or not no", did you misread it?

for my test of 4 columns withe 56000 data, and 5 values of interest 
sumif uses 0.41s and sum product uses 0.61s


----------



## Jon von der Heyden (May 18, 2013)

It is very difficult to debate this with you as you are actually making very few accurate statements.  Your statement that you deal with 200MB (even 80MB) only reinforces to me and suggests that you are not in possession of enough knowledge yet in terms of Excel development and spreadsheet design.  As an active Excel Developer, Trainer and Consultant I have been exposed to many monstrosities.  200MB is very rare, 80MB I see often enough.  Almost always the workbooks could be massively reduced.  The culprit doesn't tend to be the data but rather poor choice of formulae and bad design.  And I am convinced that you have yet to learn design, given your earlier example.

What's more you're making it up as you go.  You had no idea that your suggested functions don't work on certain data types.  Had I asked you one what circumstances you would recommend VALUE() I'm sure you would have suggested all.  After all you never suggested a mixture of arithmetic and VALUE() in your initial post.  This is what I mean when I say you are determined to argue.

Further you seem to lack an understanding of the basics of boolean logic; which is not unique to the Excel environment only.


aaalviny said:


> note : negation of Boolean true is false, as not(true)



Excel 2010ABC1NegationAs boolean2TRUE-1TRUE3FALSE0FALSESheet1Cell FormulasRangeFormulaB2=-A2B3=-A3C2=AND(B2)C3=AND(B3)

On these grounds I'm afraid you make an appauling case to support your convention.


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> It is very difficult to debate this with you as you are actually making very few accurate statements.  Your statement that you deal with 200MB (even 80MB) only reinforces to me and suggests that you are not in possession of enough knowledge yet in terms of Excel development and spreadsheet design.  As an active Excel Developer, Trainer and Consultant I have been exposed to many monstrosities.  200MB is very rare, 80MB I see often enough.  Almost always the workbooks could be massively reduced.  The culprit doesn't tend to be the data but rather poor choice of formulae and bad design.  And I am convinced that you have yet to learn design, given your earlier example.
> 
> What's more you're making it up as you go.  You had no idea that your suggested functions don't work on certain data types.  Had I asked you one what circumstances you would recommend VALUE() I'm sure you would have suggested all.  After all you never suggested a mixture of arithmetic and VALUE() in your initial post.  This is what I mean when I say you are determined to argue.
> 
> ...


those 80MB is not initially developed by me, in fact I planned to redevelop it but it requires time and simply I don't have enough time to work on it yet.
i am not sure whether you have touch spreadsheets in my industry
if there is really much data and objectives required, it can grow much large and large, especially the continuing development of methodology and requirements in my industry are just never ending and only become more and more complicated. (Yes it is really very much)

for 200MB it's because I am working on hundreds thousands of records (I personally have worked by database approach in fox pro, to check the result, and of course this is the correct approach) but others just want to have an excel checking since they are not technologically advanced enough to understand the nested queries
And basically given only the data it's just already 100MB+

and I have to emphasize, choice of functions should only be used to cater what required to be done, but not less nor extra.
if I need to convert text to numbers, use value, if I have to convert blanks and numbers to all numbers, use N, etc

well Boolean is Boolean, never a number, don't be trapped by excel's logic. Only few support the convention false = 0 and true = 1 (correct me if wrong, excel, Vba, c support this only)


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## aaalviny (May 18, 2013)

To supplement, during my work, I have redesigned worksheets to combine original 6 worksheets of different products each about 10MB to one with 5MB of clearly clean and readable formulas, in case you have question about my redesigning.


----------



## Peter_SSs (May 18, 2013)

I'm going a fair way back in the thread:



aaalviny said:


> that 41 formulas don't matter...
> .. and that's why it's (SUMPRODUCT) slow


I'd say that very much depends on what you are measuring and how often and how much of the data is likely to change.

For my testing I repeated the data you showed in that screen shot 84 times so the data extended from row 6 to row 1097 - so that there was bit more to do for my timer to measure.

Using your setup on Sheet1 and the single formula =SUMPRODUCT(--(E6:E1097="Yes"),--(H6:H1097="No"),J6:J1097) on Sheet2, a Sheet2 recalculation took 30% of the time for a Sheet1 recalculation. By that measure I wouldn't call SUMPRODUCT slow.





aaalviny said:


> If now, the data changes, "Yes" is no longer "Yes" and becomes "Y"
> if you are using sumproduct, you have to change five times in the formula (one in each sumproduct)


Do you? I would have thought you could set it up so that the formula(s) wouldn't need changing at all.
If you are worried about ongoing maintenance, why not use cell references - surely that is simpler and easier to maintain than changing a formula AND copying it down a whole column!

Excel WorkbookBCDEHJ1Value of interest:YN231260456Y27N3849N510YN611712N813YN9141015N1116N12171318Y1419Y220N31092YN91093101094N111095N121096131097Y14test 3





aaalviny said:


> that 41 formulas don't matter...


No, I'm sure the user won't mind an extra 3 columns of data cluttering their sheet when they want to print it!!  





aaalviny said:


> that 41 formulas don't matter, as long as they are easy to read and maintain


Let's talk more about ease of maintenance. Remember "future is always unpredictable" so perhaps some more data will be added over time (I currently have data down to row 1097). To cover for this possible extra data, I set up my SUMPRODUCT formula:

=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1),J6:J2000)

What maintenance do I have to do when new data is added below row 1097? None
What maintenance do you have to do when new data is added below row 1097? Copy 3 columns of formulas down to cover the new data (and possibly adjust your B2 formula, although this could be set up with a larger range at the start)

What maintenance do I have to do when new rows of data are inserted in the middle of the existing data? None
What maintenance do you have to do when new rows of data are inserted in the middle of the existing data? You have to copy 3 columns of formulas down to fill the newly inserted rows.

Hmmm, which set-up is easiest to maintain?



Talk about repeating yourself?  With a sheet set up as you have shown, but with data to row 1097 as I have, there are 3 columns of formulas repeated over 1,000 times!
With my sheet I have a single formula - nothing repeated at all.

Hmmm, who's repeating themselves?





aaalviny said:


> ... those formulae are easy to read and take instant time to understand


If you want your formulas to be easy to read and understand, why clutter your simple 13 row sample file formulas with 60 $ signs, not one of which is required?


----------



## Jon von der Heyden (May 18, 2013)

aaalviny said:


> And if the data structure consists of all the things you listed above, it is clearly not a database of qualified data, isn't it?


But Alvin Excel is NOT a database.   That's one of the beauties of it.  I can mix up up the data types.  If I want a database then I will use a database.  Excel offers us countless abilities to tap into databases so if it is important then we can always use both.



aaalviny said:


> well Boolean is Boolean, never a number, don't be trapped by excel's logic. Only few support the convention false = 0 and true = 1 (correct me if wrong, excel, Vba, c support this only)


This is an Excel forum.  We are discussing best practise Excel methods.  -TRUE equals -1, which if treated as a logical is TRUE yet again.  -FALSE equals 0, which if treated as a logical is FALSE yet again.  To invert, using arithmetic, one would use 1-Logical_Value, i.e:
1-TRUE =  1-1 = 0 = FALSE
1-FALSE = 1-0 = 1 = TRUE

You have also made suggestion that using arithmetic should be used to complete arithmetic operation and that coercion is not by design.  Had that been true then all of the following statements would yield #VALUE! error:

=TEXT(1,0)+0 = 1
="01-Jan-2013" + 1 = 41276 (which is 02-Jan-2013)
=1-TRUE = 0

So clearly the fact that Excel coerces each operand to it's numeric equivalent, it IS by design.  So both VALUE() and arithmetic are by design; except that artithmetic works on all data types whilst VALUE() does not.  I wouldn't either choose VALUE() coercion in order to discriminate a single data type (since VALUE() isn't particular to one data type anyway).  No no, one must be more explicit.  So had I needed to test the type typically using one of the IS functions (or whatever!).

I agree that there are instances where it is good to discriminate.  A fine example would be to test whether or not a value is #N/A, specifically missing from a lookup table or array.  In such an instance IF(ISNA() construct is a better candidate that IFERROR(), even though it involves an extra function call...  For this context IFERROR is too indiscriminate.

So far Alvin you have described one reasonable advantage of using VALUE() and that is the fact that a read would be inclined to better understand it.  My feeling is that this reason though is of very little significance because (a) the topic is easy to understand and (b) it shan't take a user long to figure it out either using the formula evaluation tool or forum.  Besides, I am paid to build solutions using the skills I have acquired over the years.  I am not paid to build a solution using methods only known to novices - otherwise why would you hire me?  That would surely p1ss-off my employers.  

So of all your other reasons none stand true so far.  We have many more better reasons to use -- than VALUE(), described now exhaustively in this thread:
 -- is by design
 -- is more efficient
 -- works on all data types (use IS functions to test data types, it is what the are intended for)
 -- adheres to an existing convention (best practise)
 -- is neater (shorter) and doesn't increase parenthesis

Weighing it all up -- makes a lot more sense.


----------



## aaalviny (May 18, 2013)

Peter_SSs said:


> I'm going a fair way back in the thread:
> 
> I'd say that very much depends on what you are measuring and how often and how much of the data is likely to change.


as I mentioned before, no one knows what will happen in the future
no one knows the amount of conditions to be added and no one knows the number of values of interest
we cannot decide, based on existing condition, that how likely the data is to be changed
that's why maintenance is important


Peter_SSs said:


> For my testing I repeated the data you showed in that screen shot 84 times so the data extended from row 6 to row 1097 - so that there was bit more to do for my timer to measure.
> 
> Using your setup on Sheet1 and the single formula =SUMPRODUCT(--(E6:E1097="Yes"),--(H6:H1097="No"),J6:J1097) on Sheet2, a Sheet2 recalculation took 30% of the time for a Sheet1 recalculation. By that measure I wouldn't call SUMPRODUCT slow.


if only 1 value of interests, i.e. only 1 sumproduct, is used, it clearly doesn't make a difference.
this is the same of using VLOOKUP vs INDEX+MATCH for one time only, but the fact is, if more are used, it makes a difference.

and again, no one knows about the future , <= this is one very important I am making, so that maintenance will become easier if we think about the future.



Peter_SSs said:


> Do you? I would have thought you could set it up so that the formula(s) wouldn't need changing at all.
> If you are worried about ongoing maintenance, why not use cell references - surely that is simpler and easier to maintain than changing a formula AND copying it down a whole column!


being easy maintenance doesn't necessary mean no change, but the change should be easy and obvious, and also not troublesome
your suggestion of using cell reference is good, and it qualifies the "No magic strings" principle in anti-pattern
In any models this should actually be used, good point.
This eases one maintenance effort, but there are still others.



Peter_SSs said:


> No, I'm sure the user won't mind an extra 3 columns of data cluttering their sheet when they want to print it!!


this is another issue
that worksheet is "PROCESS" in the cycle of "INPUT" -> "PROCESS" -> "OUTPUT"
if one is going to print it out, that should be an "OUTPUT" which contains no calculation logic in the process (of course if the logic is about the layout of the output, it is ok)
and that "output" worksheet should be a new one, taking values from this "process" worksheet without calculation, and only shows what the user wants to show in the printout (If you are going to print that worksheet, you don't expect to print the whole set of data, right? )



Peter_SSs said:


> Let's talk more about ease of maintenance. Remember "future is always unpredictable" so perhaps some more data will be added over time (I currently have data down to row 1097). To cover for this possible extra data, I set up my SUMPRODUCT formula:
> 
> =SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1),J6:J2000)
> 
> ...


Talking about the ease of maintenance, think when you are working on it.
If you are adding records there, you clearly know you have to copy down those helper columns, right? (as helper columns are put in front of the raw data, it is clearly visible)
and it doesn't cost a second.
this is easy-to-notice and fast update

then it comes to the point of other maintenance:
as I mentioned before, when there are more value of interests,
clearly the SUMPRODUCT formula cannot be copied down directly (it needs amendment to each SUMPRODUCT)
e.g. there are two values of interest, as shown below (note: there can be more)
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), J6:J2000)
and
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), --(F6:F2000=F1), J6:J2000)

If now, H6:H2000=H1 is no longer the desired condition, but H6:H2000<>H1 is
two separate modifications are needed, and no copy down can be done directly
imagine if there are more values of interest and different kinds of conditions
changing 10 cells of H6:H2000=H1 to H6:H2000<>H1 is just not easy (well you can use replace, but as another matter, there's always a risk when using "REPLACE" as every cells have to be checked
and the point is, for my design, I only need to modify one column, which is:
=IF(H6=H$1, 1, 0) becomes =IF(H6<>H$1, 1, 0) and drag down, then it's done.

the repeating yourself doesn't mean the repeating for each record (you have to do once for each record, and this can't be avoided)
and if the no. of checking for each record is more than once on the same issue, it's the real "Repeat Yourself"
e.g. there are two values of interest
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1),J6:J2000)
and
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), --(F6:F2000=F1), J6:J2000)
these are double checking whether H6:H2000 = H1 in the two cells, and that's why it is slower
i.e. H6=H1 is checked twice, as there are two these kinds of cells



Peter_SSs said:


> If you want your formulas to be easy to read and understand, why clutter your simple 13 row sample file formulas with 60 $ signs, not one of which is required?


easy to understand means proper using of $ signs
if $ signs are not added, it is harder to read, and easier to have errors

when cell copying is needed, think about whether that reference should be fixed or movable
it can also help users to understand the formula better
e.g. for =IF($H6="No", 1, 0) what happens if I am copying it to other columns? do I expect cell H6 to change?
In my demonstration no, as I don't expect two data fields will have the same validation.
maybe column G is checking on date, I would use =IF($G6=cutoff_date, 1, 0)
and I don't expect other columns to share this logic

on the other hand, for 
=SUMIF(B$6:B$18, 1, $J$6:$J$18)
why I don't use $ for B, it is because when I copy this formula to column C, I expect it to refer to the 1 and 0 of column C and make the SUMIF based on criteria of column C
for 6 and 18, because the no. of record doesn't change, I don't expect it to be moving when cells are copied upward and downward (which gives less change of error)
for J, I use $ because column J is the value to sum in the data, which is not likely to be changed

by using $, users can have a better feeling on what the reference refers to
e.g. using fixed reference $A$1 means the reference is a stand-alone value, while by using $A3 it probably refers to one column of a database


----------



## shg (May 18, 2013)

This thread makes a great case for youth in Asia.


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> But Alvin Excel is NOT a database.   That's one of the beauties of it.  I can mix up up the data types.  If I want a database then I will use a database.  Excel offers us countless abilities to tap into databases so if it is important then we can always use both.


Well Excel is not a database, but for efficient working I bet you will agree to have consistent data structure, as well as neat tables, right?
you don't want to see empty rows within a set of 10000 records, you don't want to see a string in the column of dates. You don't want to see a missing data, etc.
So it's exactly what a database means: to have consistent data format
Do you agree?

Actually you can mix up the data type because of the "freedom" Excel offers, but by giving too much freedom, it yields improper use of it.


Jon von der Heyden said:


> This is an Excel forum.  We are discussing best practise Excel methods.  -TRUE equals -1, which if treated as a logical is TRUE yet again.  -FALSE equals 0, which if treated as a logical is FALSE yet again.  To invert, using arithmetic, one would use 1-Logical_Value, i.e:
> 1-TRUE =  1-1 = 0 = FALSE
> 1-FALSE = 1-0 = 1 = TRUE
> 
> ...


I think, here, the concern is again on the database data. Yes, data validation is always important.
We normally don't do that (ISNUMBER, etc.) because we are confident that the data is of good quality, isn't it?
However the good quality of data may not be always true, as mentioned before, if there are flaws in the raw data, it is very important to spot them out.
I am not sure what kind of data validations you use, at least if I am working on some database, when the cutoff date is today, then I will at least use auto-filter to check whether the dates are on or before the cutoff date. This is one kind of data validation I will carry out before I move on the work.
And there are other kinds of data validation, including checking the data types.
(there may be a system error that a date field of one record is missing, it's important to spot it out)



Jon von der Heyden said:


> I agree that there are instances where it is good to discriminate.  A fine example would be to test whether or not a value is #N/A, specifically missing from a lookup table or array.  In such an instance IF(ISNA() construct is a better candidate that IFERROR(), even though it involves an extra function call...  For this context IFERROR is too indiscriminate.


and I have to say, using ISNA() doesn't involve an extra function call compared to IFERROR,
did you violate "DON'T REPEAT YOURSELF" principle by using =IF(ISNA(VLOOKUP()), 0, VLOOKUP())?
the correct way is:
A1: =VLOOKUP(), A2: =IF(ISNA(A1), 0, A1)

clearly it is one evidence why you should agree of using VALUE(), as -- is too indiscriminate


Jon von der Heyden said:


> So far Alvin you have described one reasonable advantage of using VALUE() and that is the fact that a read would be inclined to better understand it.  My feeling is that this reason though is of very little significance because (a) the topic is easy to understand and (b) it shan't take a user long to figure it out either using the formula evaluation tool or forum.  Besides, I am paid to build solutions using the skills I have acquired over the years.  I am not paid to build a solution using methods only known to novices - otherwise why would you hire me?  That would surely p1ss-off my employers.


to my side, I believe your reasonable advantage is the more efficient. And in fact the less use of parenthesis can be one advantage to me. But in contrast, the problem of being indiscriminate and aim-matching are more important to me.



Jon von der Heyden said:


> So of all your other reasons none stand true so far.  We have many more better reasons to use -- than VALUE(), described now exhaustively in this thread:
> -- is by design
> -- is more efficient
> -- works on all data types (use IS functions to test data types, it is what the are intended for)
> ...


1. -- is by design, well I don't think it's a reason of using it? the existence of this feature does not mean we have to use it?
2. yes, it's more efficient, I agree
3. If we always use IS functions to check each data, it's kind of messy. given we can continue our job given data checking is embedded, I would go for this way.
4. I still doubt, convention is just a culture, yet the majority is used to the convention/culture but doesn't ask the reason why we should follow it. If reason can be given, then yes we can follow it. But being simply convention doesn't make sense in my thought.
5. yes, I agree

so in my summary, the only advantage is: more efficient, shorter, neater, and doesn't increase parenthesis, which only relates to the "hard" side but are not related to the "soft" side of the choice of functions, I would say


----------



## MARK858 (May 18, 2013)

> if $ signs are not added, it is harder to read, and easier to have errors



WHAT! how is it harder to read? if you have a complete novice they nearly always recognize a cell reference like A1 (and know what it is referring to) because they can easily relate it to column headers and row numbers that they see on the sheet but ask they what the $ does (there have been plenty of posts where we have had to explain it).

This makes it the other way round to "it is harder to read".

The $ is used to serve a purpose, it is not there for added readability by any stretch of the imagination.

I have no problem with it being used by anyone but please don't state readability as a justification for using it.


----------



## Jon von der Heyden (May 18, 2013)

aaalviny said:


> Well Excel is not a database, but for efficient working I bet you will agree to have consistent data structure, as well as neat tables, right?
> you don't want to see empty rows within a set of 10000 records, you don't want to see a string in the column of dates. You don't want to see a missing data, etc.
> So it's exactly what a database means: to have consistent data format
> Do you agree?


Yes and no.  Sometimes I like to mix it up; say if I'm building a dashboard.  I may have several summaries on one sheet.  It's more succinct that way.  And this was my point.  There are advantages to Excel not being a database; and I fully aim to take advantage of that fact.



aaalviny said:


> and I have to say, using ISNA() doesn't involve an extra function call compared to IFERROR,


I said IF(ISNA() does.  IF is the 1st function call, ISNA is the 2nd, compared with 1 function call when using IFERROR



aaalviny said:


> did you violate "DON'T REPEAT YOURSELF" principle by using =IF(ISNA(VLOOKUP()), 0, VLOOKUP())?


No I didn't.  Have a look!



aaalviny said:


> the correct way is:
> A1: =VLOOKUP(), A2: =IF(ISNA(A1), 0, A1)


Or even more succinct yet would be to use: =LOOKUP(10^308,CHOOSE({1;2},0,VLOOKUP(...)))



aaalviny said:


> clearly it is one evidence why you should agree of using VALUE(), as -- is too indiscriminate


No!  If you recall I said that I would rather use more explicit means to test data type if it is necessary to do so.  VALUE() does not provide sufficient data type test.



aaalviny said:


> But in contrast, the problem of being indiscriminate and aim-matching are more important to me.


Well I believe I have suitably dismissed your issue of type indiscrimination.  As for aim-matching; I suggest that -- is aim-matching since -- is by design, as is the case with VALUE().



aaalviny said:


> 1. -- is by design, well I don't think it's a reason of using it? the existence of this feature does not mean we have to use it?


See above...



aaalviny said:


> 2. yes, it's more efficient, I agree


Yippeeee!



aaalviny said:


> 3. If we always use IS functions to check each data, it's kind of messy. given we can continue our job given data checking is embedded, I would go for this way.


VALUE() does not provide sufficient data type test.  How can you claim this suggestion "messy" after that monstrous sample you posted earlier?  Sheesh!



aaalviny said:


> 4. I still doubt, convention is just a culture, yet the majority is used to the convention/culture but doesn't ask the reason why we should follow it. If reason can be given, then yes we can follow it. But being simply convention doesn't make sense in my thought.


How many more reasons do you need?  Your entire case for VALUE() has been ripped apart!  It's in tatters!  Ruined!



aaalviny said:


> 5. yes, I agree


Great!

Ok I've presented my case for -- over VALUE() for coercion.  You're not bringing up any new facts - you're just repeating yourself and completely missing/avoiding the case against VALUE().  Actually you're opening new topics; topics again that should probably belong in their own threads.


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> This is an Excel forum. We are discussing best practise Excel methods. -TRUE equals -1, which if treated as a logical is TRUE yet again. -FALSE equals 0, which if treated as a logical is FALSE yet again. To invert, using arithmetic, one would use 1-Logical_Value, i.e:
> 1-TRUE = 1-1 = 0 = FALSE
> 1-FALSE = 1-0 = 1 = TRUE
> 
> ...


Sorry I think I have missed this part in the previous reply

I think by Excel best practice, some basic aims have to be kept e.g. the aim matching

=TEXT(1,0)+0 = 1
="01-Jan-2013" + 1 = 41276 (which is 02-Jan-2013)
=1-TRUE = 0

Yes, they are false in my sense, or I would say in the general logic it is false
(Do you think they are true before learning excel? I don't believe you thought that 1-TRUE makes sense, nor a text + number = number makes sense, right?)
it's because of excel that makes you think it's true, but it is in fact a logic violation! although excel allows this false logic, should we continue to follow it?


MARK858 said:


> WHAT! how is it harder to read? if you have a complete novice they nearly always recognize a cell reference like A1 (and know what it is referring to) because they can easily relate it to column headers and row numbers that they see on the sheet but ask they what the $ does (there have been plenty of posts where we have had to explain it).
> 
> This makes it the other way round to "it is harder to read".
> 
> ...


in that case do you mean one should never use any $ in the design? I think everyone, if he/she knows $, will use that for a higher efficiency, right?
and $ is a syntax designed by Excel, to denote the fixed reference, which is important to the design of excel which relies much on the layout and address.
If one starts using Excel as a general excel user, he/she should learn $. It is similar to learning +, -, *, /, and SUM() for them.
therefore I don't think we should address on whether a non-excel user should understand it, by novice I don't mean about knowing nothing about Excel, but of general excel users who have basic knowledge.

About why I say it's of better readability, I already mentioned it above.
and one purpose is for the correct logic, and another purpose is for readability


----------



## aaalviny (May 18, 2013)

Jon von der Heyden said:


> Yes and no.  Sometimes I like to mix it up; say if I'm building a dashboard.  I may have several summaries on one sheet.  It's more succinct that way.  And this was my point.  There are advantages to Excel not being a database; and I fully aim to take advantage of that fact.


well, several summaries on one sheet is about the "OUTPUT" I mentioned, no matter how the OUTPUT layout is, it is independent of the logic. and It can be of any formatting as the users want to present
i.e. in the "PROCESS" worksheet, I suppose you will have the data in a good format like a database, right?


Jon von der Heyden said:


> I said IF(ISNA() does.  IF is the 1st function call, ISNA is the 2nd, compared with 1 function call when using IFERROR
> No I didn't. Have a look!


Oh well, I didn't count the IF  
Although there is one more parenthesis, I think it doesn't affect the readability as it's very clear?
IF(ISNA()), one is clearly a IF, and the other clearly a "is N/A", easy to understand.


Jon von der Heyden said:


> Or even more succinct yet would be to use: =LOOKUP(10^308,CHOOSE({1;2},0,VLOOKUP(...)))


oh no, it's an example of difficult-to-read formula! I absolutely don't think it's succinct!
what is 10^308? and you use CHOOSE {1, 2}, what is this?
I have to think for a few seconds before knowing what it means!
and again, it does not match the aim! that's why it's hard to read!
the aim is: check if it's N/A, if yes then 0, no then the value
Going for an indirect approach just adds more complexity!



Jon von der Heyden said:


> No!  If you recall I said that I would rather use more explicit means to test data type if it is necessary to do so.  VALUE() does not provide sufficient data type test.


At least not as generalized as --. I would say, so it is best to narrow down the possibility of overlooked errors.


Jon von der Heyden said:


> Well I believe I have suitably dismissed your issue of type indiscrimination.  As for aim-matching; I suggest that -- is aim-matching since -- is by design, as is the case with VALUE().


Well, the aim of -- is do twice negative, but not conversion from text to values
this -- being able to convert text to values is by design, i.e. actual implementation
in fact the aim of -- deviates from the actual result in the case of data format conversion.
or alternatively, the actual implementation result does not match with the aim for Excel's -- design. yet it accidentally comes up the correct solution.


Jon von der Heyden said:


> VALUE() does not provide sufficient data type test.  How can you claim this suggestion "messy" after that monstrous sample you posted earlier?  Sheesh!


as mentioned before, not sufficient but at least more sufficient than --.


Jon von der Heyden said:


> How many more reasons do you need?  Your entire case for VALUE() has been ripped apart!  It's in tatters!  Ruined!
> Great!
> 
> Ok I've presented my case for -- over VALUE() for coercion. You're not bringing up any new facts - you're just repeating yourself and completely missing/avoiding the case against VALUE(). Actually you're opening new topics; topics again that should probably belong in their own threads.


well I am still having two solid reasons there: aim-matching (and therefore easier to understand) and the indiscriminate
which I think they are soft advantages
and I think the soft advantages are more important to your hard advantages of simply efficient / less number of parenthesis

Yes I do think there are many threads to be able to be started. and thanks for the reminder I am just not going to reply those off-topic issues.


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## Jon von der Heyden (May 18, 2013)

good luck!


----------



## FormR (May 18, 2013)

This was the point where you lost credibility.



aaalviny said:


> 41 formulas don't matter


----------



## MARK858 (May 18, 2013)

> in that case do you mean one should never use any $ in the design


I don't consider it as "design". I consider the use of absolute referencing to be there to serve a purpose i.e. solely to make a reference either wholly or partly absolute. 
Sometimes the whole reference *needs* to be absolute, sometimes part of the reference *needs* to be absolute.
Sometimes it specifically mustn't be absolute. 

When it doesn't matter if it is absolute or not then it is *personal choice* whether you use it or not but the one it doesn't do is make the formula easier to read.
Why do you think that it is suggested that you put symbols in passwords?



> if he/she knows $, will use that for a higher efficiency, right?


No. The use of absolute references is there when you *need* the references to be absolute and solely that. If you need to drag the formula in future you need to remove them which makes spurious use of them inefficient.



> I don't think we should address on whether a non-excel user should understand it


No we should address readability (by the the way anyone who has opened an Excel spreadsheet is an Excel user at whatever level, a non-excel user won't see the formula in the first place) 



> If one starts using Excel as a general excel user, he/she should learn $


Of course they should but that has nothing to do with the readability of the formula.



> and another purpose is for readability


Nonsense, adding extra characters to any text makes it harder to read, the fewer characters the easier it is to read. 

Where I come from we would consider it clutter. If it doesn't serve a purpose why be there?

We all regularly use the word "Function" that word is a bit of a giveaway.

Like I said I have no objection to anyone using absolute referencing when not necessary as that is personal choice but I do object to you giving reasons that are just there to fill white space.


----------



## aaalviny (May 18, 2013)

FormR said:


> This was the point where you lost credibility.


41 simple formulas compared with 1 complex formulas
which is better?

let me demonstrate with an example,
it was taken from a thread I replied earlier, no offence to Aladin, given the layout restriction, there are no means to have simple formulas there
but the point I am going to make is:
which approach is the easiest to be understood!
and it's how we should use helper columns to shorten formulas and provide a better readability.
60 formulas vs 30 formulas

thread:
http://www.mrexcel.com/forum/excel-...rds-where-adjacent-cells-meet-1-criteria.html

the different approach file:
http://aaalviny.magix.net/public/alvin_excel_helper_col_ease_readability_and_simpler_20130519.xls
I added the best solution I suggested at the end columns, which is different from what I replied in the post, due to the layout restriction given by OP
due to laziness I don't give the column title 
let see which way you can understand better and faster, and find it easier to maintain and update


MARK858 said:


> I don't consider it as "design". I consider the use of absolute referencing to be there to serve a purpose i.e. solely to make a reference either wholly or partly absolute.
> Sometimes the whole reference *needs* to be absolute, sometimes part of the reference *needs* to be absolute.
> Sometimes it specifically mustn't be absolute.
> 
> ...


I will start a thread about this if you wish


----------



## FormR (May 18, 2013)

aaalviny said:


> 41 simple formulas compared with 1 complex formulas
> which is better?



*The 1*


----------



## aaalviny (May 18, 2013)

FormR said:


> *1*


did you check the above attachments?


----------



## FormR (May 18, 2013)

aaalviny said:


> did you check the above attachments?



No, I stopped reading your responses in any kind of detail some time ago.


----------



## aaalviny (May 18, 2013)

FormR said:


> No, I stopped reading your responses in any kind of detail some time ago.


I would like to have a fruitful discussion, please don't just feel dissatisfied with my comments and then give negative response due to no reasons.
I am sad to say it's a kind of disrespect in my sense.

In case you don't like to read the attachment, here is a short recap:

30 cells of formulas contains: (less number of formulas)
there are two kinds

1. =(COUNT(B2:C2,E2:F2,H2)>0)+0
2. (array formulas)
2007 version:
=IFERROR(INDEX($A$2:$A$16,SMALL(IF($I$2:$I$16=1, ROW($A$2:$A$16)-ROW($A$2)+1),ROWS($J$2:J2))),"")
2003 version:
=IF(ROWS($J$2:J2)<=SUM($I$2:$I$16),INDEX($A$2:$A$16,SMALL(IF($I$2:$I$16=1, ROW($A$2:$A$16)-ROW($A$2)+1),ROWS($J$2:J2))),"")

and for my more formulas: (there are 4 kinds)
=IF(COUNT($B5:$C5, $E5:$F5, $H5)>0, W4+1, W4)
=X4+1
=INDEX($A$4:$A$18, MATCH($X5, $W$4:$W$18, 0))
and
=IF(ISNA($Y5), "", $Y5)


----------



## MARK858 (May 18, 2013)

aaalviny said:


> I will start a thread about this if you wish


You can start a new thread if you wish, but I will not participate as I would much rather spend my time on the forum helping people rather than getting involved in a never ending discussion in which you obviously will not change your opinion and I will always differ.

If you choose to take away Excels built in functionality to drag a formula by using needless absolute references I am not going to try and stop you.

I will still advice people how to use them as intended.

I wish you well on your future endeavours.


----------



## aaalviny (May 18, 2013)

MARK858 said:


> You can start a new thread if you wish, but I will not participate as I would much rather spend my time helping people rather than getting involved in a never ending discussion in which you obviously will not change your opinion and I will always differ.
> 
> If you choose to take away Excels built in functionality to drag a formula by using needless absolute references I am not going to try and stop you.
> 
> ...


As similar to situation of *Jon von der Heyden*
He also said he would not participate, but at the end he does! which I think it's a good thing. I have to thank him for his comments.
and as you see, we have come to a concensus (if not all, at least most) on the good and bad of each of the option -- and VALUE() in the end
Then, how we weigh the importance of those good/bad side is our own decision

but at least, we just can have a discussion on the pros and cons

These kinds of discussion are not only simply talking, they are useful in terms of future excel design of our own practice.
to achieve the aim of better maintenance ability and readability and to avoid errors,
which in the end it will benefit all of us from the discussion

I sincerely hope you can, as well, enjoy the discussion.
or do you feel better if you treat it as a "debate competition", with for side and against side?


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## FormR (May 18, 2013)

aaalviny said:


> I would like to have a fruitful discussion



I wish I didn't get involved - but either way; all points have been addressed and there is nothing left here that is fruitful.


----------



## MARK858 (May 18, 2013)

> As similar to situation of *Jon von der Heyden*
> He also said he would not participate, but at the end he does!



But then I'm not Jon.


----------



## Peter_SSs (May 18, 2013)

aaalviny said:


> this is another issue
> that worksheet is "PROCESS" in the cycle of "INPUT" -> "PROCESS" -> "OUTPUT"
> if one is going to print it out, that should be an "OUTPUT" which contains no calculation logic in the process


 Well, that’s your view, but this forum is made up of about 226,500 members who have posted at least once. Of those, about 224,700 have less than 200 posts. I don’t think it unreasonable to assume the vast majority of those (who are the members who ask most of the questions) are not highly skilled in Excel (your good self excepted of course) and couldn’t care less about that and, if printing, are likely to print the sheet as is. They just want a reasonable answer to their question. 






aaalviny said:


> Talking about the ease of maintenance, think when you are working on it.
> If you are adding records there, you clearly know you have to copy down those helper columns, right?


Right. But *by your own question* .. 





aaalviny said:


> .. how can you sure the file won't be exposed to others?


.. and where I work, many of my (Excel novice) colleagues who would likely be adding the data would not find that obvious or trivial - they just expect to add their data and have Excel do the rest - and it does.






aaalviny said:


> e.g. there are two values of interest, as shown below (note: there can be more)
> =SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), J6:J2000)
> and
> =SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1)*, --(F6:F2000=F1)*, J6:J2000)


A. First, to get the second result as well as the first ..

SUMPRODUCT: Decide which cell to put the new formula in, Copy/Paste the text of the original formula & add the red bit. Took me about 15 seconds (I’m not fast on the keyboard)
Your layout: Decide on a new helper column to use (I used column A with your original layout simply because that was quickest (I know it makes a poorer layout now), put the *new formula* *=IF(F6=F$1,1,0)* in A6, *amend the formula in B6* to =*A6**C6*D6 by adding the red bit, *copy both formula down to the bottom* of my data (row 1097). I did that by double clicking the Fill Handle but still had to *check that the new formulas populated to the end of the data* in case another user had inserted blank rows in my data. Time > 15 seconds.

B.(i) Now change the condition in column H to “<>H1”. If more than 2 or 3 results required this change, then yes, your layout would be a little faster to modify, but not a lot.

B.(ii) Let’s consider a different change though. Suppose we have this result
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), --(F6:F2000=F1), J6:J2000)
And I want to* keep that result but also have this one*
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000*<>*H1), --(F6:F2000*<*F1), J6:J2000)

SUMPRODUCT: Decide on formula cell, Copy/Paste original formula text, modify the red.

Your layout: Find *3 new columns* to use (say L,M,N) as you still need the original ones for the first result. *3 new formulas*:
L6: *=IF(H6<>H$1,1,0)*
M6: *=IF(F6<F$1,1,0)*
N6: *=C6*L6*M6*
*All copied down 1,000+ rows*
Decide on new result cell, new formula: *=SUMIF(N$6:N$1097,1,J$6:J$1097)*

No prizes for guessing which is fastest.


You mentioned the possibility of having up to 10 SUMPRODUCT formulas to maintain. If that were these case, and the conditions varied for each, I’d have to say I would much rather maintain those 10 formulas than maintain, *who knows .. 20, 30, 40, … helper columns* as they need to be added to accommodate the various conditions & combination of conditions as I've just demonstrated.



Now let’s return to some points you tried to make in your first post


aaalviny said:


> the aim of these are to convert texts to numbers
> the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
> the aim of INT() is to extract the integer portion of a number
> on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim
> ...


Quite a few mentions of *aim, meaning, ease of understanding*. So let's have a look at those things with this exercise.

SUMPRODUCT
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000<>H1), --(F6:F2000<F1), J6:J2000)

Looking at this I (& I contend anybody even reasonably familiar with SUMPRODUCT) immediately can see that this is summing column J where col E matches E1, col H does not match H1 and col F is less than F1.
It matches the aim of the required result, the meaning is clear, it is easy to understand.

Your Layout
=SUMIF(N$6:N$1097,1,J$6:J$1097)

Looking at this I can see immediately that this is summing column J where column N contains a 1.
The formula itself is clear and easy to understand, it just tells me *nothing* about the aim.
I have to backtrack through the formulas in column N then C, L & M before I can eventually deduce the aim.



Just two further comments: 
We are only talking about SUMPRODUCT for this example question, because you are using Excel 2003. We could/should be using more efficient functions than SUMPRODUCT if using later versions. I find it interesting that you say you make your living from Excel but you are using a 10 year old version of the product when 3 newer versions exist.

Now I’m finished here too and am going back to answering questions.


----------



## aaalviny (May 19, 2013)

Peter_SSs said:


> Well, that’s your view, but this forum is made up of about 226,500 members who have posted at least once. Of those, about 224,700 have less than 200 posts. I don’t think it unreasonable to assume the vast majority of those (who are the members who ask most of the questions) are not highly skilled in Excel (your good self excepted of course) and couldn’t care less about that and, if printing, are likely to print the sheet as is. They just want a reasonable answer to their question.


About "Input" -> "Process" -> "Output", Jon and I were not talking about how we should give our answers to those asking questions. but I believe it's just our own discussion about the layout design freedom offered by Excel and how we are using it.

Furthermore, "Input" -> "Process" -> "Output" is a basic knowledge to every system (Excel is also one of it)
Google for it and I think you will see many results.
e.g. Input Process Output
And for every workbook, "Input" -> "Process" -> "Output" is just an integral part.
Of course, whether one mixes any of above in one place is another issue.

Finally, I think we are here not only to answer others' questions, but to share our thoughts on problem solving to further equip ourselves. and to create best practices (otherwise how best practices are created?)
and people here are always embedding their best practice in their solutions provided!



Peter_SSs said:


> Right. But *by your own question* .. .. and where I work, many of my (Excel novice) colleagues who would likely be adding the data would not find that obvious or trivial - they just expect to add their data and have Excel do the rest - and it does.


here I would like to clarify. There are two ways we make our excels for
1. work for a task, it may be reviewed by peers / higher-level managers. The peers may possess similar excel skills to conduct a check on the working and results, while the higher-level managers may emphasize more on the reasonableness, at the same time he/she may also possess certain excel skills
2. send out for other parties to use

for what you are mentioning about letting other novice users to simply input the data and get the result, it should be point 2 above.
usually in that case, of course the maintenance is more complicated, and due to the fact that they are novice users,
in my workplace, we will simply protect the workbook, and state some restrictions (all in all you won't consider row 65536 for all the formulas for 2003, e.g. may limit the number of data to 1000.)
and may include something like =IF($A2="", "", ABC) for the data validation process.
and I would like to add that, even SUMPRODUCT is used, does the user know he/she has to edit the formulas there?
I think we are not talking about how we release our workbooks to other novice users which simply only know to input data but with nearly no Excel knowledge.

And clearly, you may guess, I am talking about point 1, which is more important I believe.
my peers and managers are just possessing a good knowledge to the formulas (of course not about those complicated array formulas, we tend to avoid it actually)

From your reply, I wonder what your role is? design a workbook for other users to input data, but the "other users" are the same team as yours? Then can I ask why your colleagues don't go to learn yet if they have to do that job, given you are in the same team? (I know it's a bit personal, you may choose not to answer if you don't want to)


Peter_SSs said:


> A. First, to get the second result as well as the first ..
> 
> SUMPRODUCT: Decide which cell to put the new formula in, Copy/Paste the text of the original formula & add the red bit. Took me about 15 seconds (I’m not fast on the keyboard)
> Your layout: Decide on a new helper column to use (I used column A with your original layout simply because that was quickest (I know it makes a poorer layout now), put the *new formula* *=IF(F6=F$1,1,0)* in A6, *amend the formula in B6* to =*A6**C6*D6 by adding the red bit, *copy both formula down to the bottom* of my data (row 1097). I did that by double clicking the Fill Handle but still had to *check that the new formulas populated to the end of the data* in case another user had inserted blank rows in my data. Time > 15 seconds.


Here, let me clarify what I emphasize on.
Everybody has different working speeds in excel. To me these should take less than 10 seconds as I am using only keyboard shortcuts to navigate around the cells and to fill the formulas. and I don't have to check whether the end row has been filled with the required formula. (arrows with Ctrl and Shift to navigate, together with Ctrl + D or Ctrl + Enter to fill the formulas)
The point is not the working speed I suppose, but whether one knows what to do and on the speed of the calculation.
and more importantly, the clear flow of calculation I would say, and this is to be supplemented below.


Peter_SSs said:


> B.(i) Now change the condition in column H to “<>H1”. If more than 2 or 3 results required this change, then yes, your layout would be a little faster to modify, but not a lot.
> 
> B.(ii) Let’s consider a different change though. Suppose we have this result
> =SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000=H1), --(F6:F2000=F1), J6:J2000)
> ...


In summary, there are just two sides of whether which design is faster to modify. But again I have to supplement below about this point.
*



Peter_SSs said:



			You mentioned the possibility of having up to 10 SUMPRODUCT formulas to maintain. If that were these case, and the conditions varied for each, I’d have to say I would much rather maintain those 10 formulas than maintain, who knows .. 20, 30, 40, … helper columns as they need to be added to accommodate the various conditions & combination of conditions as I've just demonstrated.
		
Click to expand...

*As I mentioned long time ago, some criteria can be combined to become 1 criteria to use only 1 columns, if it's appropriate. E.g. to check whether a date is within this year and before today, this can be combined to 1 cell. Or if there are any others situations that the two criteria won't be separating from each other. All in all it's very rare to have over 5 to 6 of different conditions which are not complementary event to other conditions. As from my working environment as well as the questions asked here (but these 5 to 6 different conditions can give >10 SUMPRODUCT formulas, as a combination of Event A true, Event A false, Event B true, Event B false, etc.)
and if it really requires 20 or 30 of helper columns, I think the designer has to think about why those are needed. and what are the problems underlying those helper columns, if those helper columns are required, I don't expect the SUMPRODUCT would be simple as well.

Let me demonstrate another example, about the ease of maintenance and why "Don't Repeat Yourself" matters here.
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000<>H1), --(F6:F2000<f1), j6:j2000)
>=F1), J6:J2000)
=SUMPRODUCT(--(E6:E2000=E1),--(E6:E2000 > $E$2), --(F6:F2000>=F1), J6:J2000)
<f1), j6:j2000)
=SUMPRODUCT(--(E6:E2000=E1),--(E$6:E$2000 > E$2), --(F6:F2000>=F1), J6:J2000)
=SUMPRODUCT(--(E6:E2000><e1),--(e6:e2000>E$2), --(F6:F2000>=F1), J6:J2000)

Now in case for the three formulas containing the logic E6:E2000>E2
now it should be replaced with E6:E2000=E2
and yes, those $ exists. all in all you won't know any people accidentally adds $ there, not only you are developing the workbook (I use "develop" here, to distinguish from "user")
and you may have peers like me that who like to add $ in front of the row reference to fix the data
How will you deal with the replacement and there may be some extra spaces there to avoid direct replacement?
it has to be checked one by one for the sake of $ and empty spaces?
however those segments are clearly of the same logic, but multiple replacements / checks are required because of the violation to check for E6:E2000 = E2 once for each SUMPRODUCT formulas.
this logic is duplicated, and so it creates troubles when it requires update.
At least you will feel the replacement can used better if you can ensure there's no $ and spaces, and then you can use replace all, but who knows? given there are multiple formulas containing that logic segment and others may be updating it as well.
</e1),--(e6:e2000></f1),></f1),>*<f1), j6:j2000)
<f1), j6:j2000)
<e1),--(e6:e2000>





Peter_SSs said:



			Now let’s return to some points you tried to make in your first post
Quite a few mentions of aim, meaning, ease of understanding. So let's have a look at those things with this exercise.

SUMPRODUCT
=SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000<>H1), --(F6:F2000<f1), j6:j2000)

Looking at this I (& I contend anybody even reasonably familiar with SUMPRODUCT) immediately can see that this is summing column J where col E matches E1, col H does not match H1 and col F is less than F1.
It matches the aim of the required result, the meaning is clear, it is easy to understand.

Your Layout
=SUMIF(N$6:N$1097,1,J$6:J$1097)

Looking at this I can see immediately that this is summing column J where column N contains a 1.
The formula itself is clear and easy to understand, it just tells me nothing about the aim.
I have to backtrack through the formulas in column N then C, L & M before I can eventually deduce the aim.
		
Click to expand...

</f1),></e1),--(e6:e2000></f1),></f1),>*<f1), j6:j2000)
<f1), j6:j2000)
<e1),--(e6:e2000><f1), j6:j2000)
I would like to clarify here that
I didn't say SUMPRODUCT does not satisfy aim-matching
And I would not say my approach does not satisfy aim-matching as well.

To illustrate
the aim of this job is: (I just optionally type the column chosen, anyway just a variable)
Representation 1: check for each record if col D and col E matches the required criteria, then sum col F
this is how you present with your SUMPRODUCT
Representation 2:
a) check for each record if Col D and col E matches the required criteria
b) if it matches, then sum col F
Representation 3:
a) check for each record if Col D matches the required criteria
b) check for each record if Col E matches the required criteria
c) if it matches both criteria, then sum col F

They all represent the same aim, and my formulas are exactly doing those
but the point is how we represent by how it is broken down to smaller parts
and I have represented the aim by breaking down the parts (e.g. similar to modularization in programming, we can't just say we don't understand the whole thing by merely reading one of the modules!)
and all in all it's not a good practice to put everything in the same Sub (in VBA language)
and that's similar to excel approach I would say
Modularization leads to easier understanding, the same applied to Excel.
In terms of code, we can have shorter lines of code with a clearer identification by the module name
In terms of Excel formulas, we can have simpler formulas with column header to describe the requirement
by pressing F2 in the cell of A2*C2*D2 can already show what are the dependency and what are the criteria
at least I don't think my layout gives a worse readability. (I think mine is better, still, from the reasons above on decomposing formulas)


so due to above, the aim-matching problem is totally different for the case with -- and VALUE()
see discussion in older posts if you wonder how their aim differs
my aim-matching is only dealing with -- and VALUE()

and of course the aim matching exists in
=LOOKUP(10^308,CHOOSE({1;2},0,VLOOKUP(...)))
as well 
</f1),></e1),--(e6:e2000></f1),></f1),>*<f1), j6:j2000)
<f1), j6:j2000)
<e1),--(e6:e2000><f1), j6:j2000)


Peter_SSs said:



			Just two further comments: 
We are only talking about SUMPRODUCT for this example question, because you are using Excel 2003. We could/should be using more efficient functions than SUMPRODUCT if using later versions. I find it interesting that you say you make your living from Excel but you are using a 10 year old version of the product when 3 newer versions exist.

Now I’m finished here too and am going back to answering questions.
		
Click to expand...

</f1),></e1),--(e6:e2000></f1),></f1),>*<f1), j6:j2000)
<f1), j6:j2000)
<e1),--(e6:e2000><f1), j6:j2000)
I guess you mean SUMIFS()?
use of parenthesis doesn't give a matter given it's clear, e.g.
there are just nearly no difference in understanding SUMIFS and SUMPRODUCT, although their syntax differs
and the parenthesis in SUMPRODUCT is not nested together, that's why the clearness doesn't differ much, comparing:
1. =IFERROR(INDEX($A$2:$A$16,SMALL(IF($I$2:$I$16=1, ROW($A$2:$A$16)-ROW($A$2)+1),ROWS($J$2:J2))),"")
2. =SUMPRODUCT(--(E6:E2000=E1),--(H6:H2000<>H1), --(F6:F2000<f1), j6:j2000)

my aim, probably different from you, I am using excel to create outputs, but not from the prospective as a technical designer/consultant which requires full understanding of different versions of Excel.
But all in all, the design problem should basically be the same across different versions. To supplement, there may be some differences regarding IFERROR which is new in 2007, or maybe SUMIFS, although I find no difference for SUMIFS.
In my working environment, if anyone doesn't possess above-average Excel techniques and doesn't possess fair amount of keyboard shortcuts in Excel (keyboard is used >=90% of the time I would say when working in Excel), I would say he/she can't stay. (of course it can be exempted for people without relevant work experience, but that of course, must only be small potatoes, or extremely high-level managers, and even for small potatoes, they are learning those Excel knowledges and keyboard shortcuts from working as internship or fresh graduates)
All in all, using Excel, no matter in 2003 or later versions, only means to get the job done, isn't it?</f1),></f1),></e1),--(e6:e2000></f1),></f1),>*<f1), j6:j2000)
<f1), j6:j2000)
<e1),--(e6:e2000><f1), j6:j2000)
</f1),></e1),--(e6:e2000></f1),></f1),>*


----------



## RoryA (May 20, 2013)

There's a typo in your conclusion - it should read:

As a conclusion, *in my opinion* please use VALUE() instead of --, *1, ^1, +0 or INT() for *what I and I alone consider* Excel best practice

It seems fairly clear that your "best practice" is not a consensus in any way, any more than my suggesting all workbooks should be in 20 point font because it's easier to read would be.


----------



## aaalviny (May 20, 2013)

RoryA said:


> There's a typo in your conclusion - it should read:
> 
> As a conclusion, *in my opinion* please use VALUE() instead of --, *1, ^1, +0 or INT() for *what I and I alone consider* Excel best practice
> 
> It seems fairly clear that your "best practice" is not a consensus in any way, any more than my suggesting all workbooks should be in 20 point font because it's easier to read would be.


you can say that's my opinion, but please don't extend it to the ground only I alone agree it's the best practice, and I have to say it's clearly not from my environment. (and that actually generally speaking, as an article written by me, it already implies that things are going from my viewpoint, I don't think there's a need to add your wordings)
I admit, it's not a consensus here from all your replies I can see.
However, that's again to mention, the decision is based on how you weigh "aim-matching+easy understanding" and "indiscrimination" versus "efficient" and "the little improvement in speed" (I really have to say little, as this speed is little comparing with the functions that are comparing whether two values are the same e.g. MATCH(,,0) and VLOOKUP(,,,0))

As a final word, I don't know whether people are considering the first two advantages "aim-matching+easy understanding" and "indiscrimination" as important when thinking about the best practice
or simply weigh fast and efficient in the top priority. That's what I would like to say here.
Experiencing some changes+shocks and starting to think in a different prospective are a good sign to all, including me.
(I had a change in early 2012)


----------



## Jon von der Heyden (May 20, 2013)

Oh my goodness Alvin are you still fighting this corner?  You are persistent, I'll give you that!


aaalviny said:


> you can say that's my opinion, but please don't extend it to the ground only I alone agree it's the best practice


Alvin; at the time of writing this there have been 715 views of this thread and 45 replies.  That's quite a bit of attention isn't it?  And how many people have supported your notion to use VALUE() for coercion?  =VALUE(0)!!!  =INT(0)!!!  =N(0)!!!  This really IS your opinion and your opinion alone!  And to be frank the notion is at best ridiculous!


----------



## RoryA (May 20, 2013)

aaalviny said:


> you can say that's my opinion


And I did, precisely because that is all it is



> as an article written by me, it already implies that things are going from my viewpoint, I don't think there's a need to add your wordings



In _my_ opinion that is a specious argument designed to allow you to state anything you like as though it were fact.





> However, that's again to mention, the decision is based on how you weigh "aim-matching+easy understanding" and "indiscrimination" versus "efficient"



Precisely. It's a personal decision and will likely vary from case to case. I personally do not believe there to be _one_ best practice that can be stated regarding legibility versus efficiency. It will always be a trade-off. However, using 41 formulas instead of one *just in case* there may be changes in future requirements does not appear to be a firm footing for any argument.



> Experiencing some changes+shocks and starting to think in a different prospective are a good sign to all, including me.



Now *that* I actually do agree with wholeheartedly. It is good to question and challenge accepted wisdom as long as both sides are actually listening and not simply conducting two monologues.


----------



## aaalviny (May 20, 2013)

Jon von der Heyden said:


> Oh my goodness Alvin are you still fighting this corner? You are persistent, I'll give you that!
> 
> Alvin; at the time of writing this there have been 715 views of this thread and 45 replies. That's quite a bit of attention isn't it? And how many people have supported your notion to use VALUE() for coercion? =VALUE(0)!!! =INT(0)!!! =N(0)!!! This really IS your opinion and your opinion alone! And to be frank the notion is at best ridiculous!


Yes Jon, at least, I hope to make a greater influence, at least although I can't beat your efficient and fast issue
and you cannot beat my aim-matching and being indiscriminate on the other hand, right? 
I hope to raise a concern about these two advantages, which most people are not aware of
and although most people do not value it, I value it more than the efficient and "fast" issue
(this fast, is opposed to what I think the real issue related to speed is.
Speed, to me, depends on the data structure organization, DRY and correct choice of equality checking (including lookups) mostly)

to quote,


> one of the anti-patterns:
> Premature optimization: Coding early-on for perceived efficiency, sacrificing good design, maintainability, and sometimes even real-world efficiency
> 
> "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
> ...



Program optimization - Wikipedia, the free encyclopedia


RoryA said:


> And I did, precisely because that is all it is
> 
> In _my_ opinion that is a specious argument designed to allow you to state anything you like as though it were fact.
> 
> ...


so yes, still the point I am to make
people do not value the future maintenance much. That's the problem I very much would like to emphasize.
Seeing an originally seemingly simple workbook to grow more complex and complex and harder to read, I find this important.
and the "just in case", well, again, no one knows about the future. It's about whether the design can be re-usability and maintained. I must say it's very important, it's lucky that this is well valued in the programming world.


----------



## lrobbo314 (May 20, 2013)

Wow.  
I hope that today is your day off.


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## Jon von der Heyden (May 20, 2013)

aaalviny said:


> and you cannot beat my aim-matching and being indiscriminate on the other hand, right?


Wrong!  I have already made a clear point that VALUE() does not sufficiently support type matching (because it is not exclusive) and that one should test the type using different means.  And I have described that with -- coercion is aim-matching because the coercion process is by-design and thus aim-matching!  But it seems you fail to comprehend...

P.S. 761 views and still no-one agrees with you.  Will you EVER concede??


----------



## MARK858 (May 20, 2013)

> I don't know whether people are considering the first two advantages "aim-matching+easy understanding" and "indiscrimination" as important when thinking about the best practice
> or *simply weigh fast and efficient in the top priority.* That's what I would like to say here.
> Experiencing some changes+shocks and starting to think in a different prospective are a good sign to all, including me.



The only and last thing I will say (on any of the subjects (it seems to move once any subject is discussed)) is the statement above a contradiction of the statement (below)  in a previous thread?(which being honest the post I found a lot more irritating than anything in this thread, which is why I didn't post at the time)? 



> note the following if you want a fast excel (I believe most people here don't care about it)


 

is there a retraction in order or is this just yet another contradictory statement?


----------



## Macropod (May 20, 2013)

aaalviny: Try to foist your highly subjective and highly opinionated approach in most places I'm familiar with and it's you who would be out of a job. Indeed, an employer who allows you to discriminate against those who don't conform to your ill-conceived standards would likely end up with you and your employer in court. All you've done in this thread is to prove beyond doubt your stupid refusal to accept that you are wrong.

There is a good reason why no-one here supports your stance. You are so far out of step with 'best practice' it isn't funny. Get the message or find a career better suited to your intransigence ...


----------



## SydneyGeek (May 21, 2013)

> note the following if you want a fast excel (I believe most people here don't care about it)




Hang on. You use mega-workbooks, and claim no-one is interested in a fast Excel? Do you drink a lot of tea or coffee waiting for your models to recalc? Most of the people I work with want to get the job done. That means fast.

Denis


----------



## aaalviny (May 21, 2013)

Jon von der Heyden said:


> Wrong!  I have already made a clear point that VALUE() does not sufficiently support type matching (because it is not exclusive) and that one should test the type using different means.  And I have described that with -- coercion is aim-matching because the coercion process is by-design and thus aim-matching!  But it seems you fail to comprehend...
> 
> P.S. 761 views and still no-one agrees with you.  Will you EVER concede??



1. indiscrimination:
as I said, it is not sufficient, but at least more sufficient.
Isn't it enough in the decision to choose over two functions? (just an MC with two choices, you want to do an operation as required, and there is different parameter input restrictions between them, which one you will choose? of course the one that matches better and fit, but the looser one!)
we are just not discussing whether a third thing should be used,
and we are here only to compare the two functions on the aim of text -> number conversion


2. aim-matching
I thought you understood it but I think I have to explain it in detail now
please don't use the word "design" to take it for granted and neglect the problems within the design




process of design of functions (or operation):


1. figure out the need of a function, and define the aim
2. choose the choice of name of the function, to comply with the aim above
3. design the algorithm to achieve the aim
4. implementation
5. check whether the final result matches with aim 1
6. check if there are any side effects


for a negative sign:
1. to do the arithmetic negative to a number
2. "-" is a well known mathematical operator to represent this
3. (I don't know the detailed of it)
4. e.g. -3
5. yes it matches
6. -"3" = -3, this conversion from text to value is not the aim of "-" function! arithmetic negative sign is only working on number (but should not do the conversion as well, as the name "arithmetic sign" suggested
If it's the original aim, it should be of a better naming, but not just "-"
i.e. VALUE() <- which its name matches the aim of format conversion


here, double negation is used, but the usage of negative sign is totally thrown away, since the aim to do an arithmetic negative clearly does not apply to the situation here (you only want to convert the text to values),
By only focusing on the side effect of negative sign but do not use the actual aim of negative sign,
it is clearly a non-aim-matching point


and, your "aim-matching" is just your own aim and your result (yes it's clearly matched)
but the aim of the function / operator is clearly not considered!


----------



## aaalviny (May 21, 2013)

MARK858 said:


> The only and last thing I will say (on any of the subjects (it seems to move once any subject is discussed)) is the statement above a contradiction of the statement (below)  in a previous thread?(which being honest the post I found a lot more irritating than anything in this thread, which is why I didn't post at the time)?
> 
> 
> 
> is there a retraction in order or is this just yet another contradictory statement?


To reply this post I think I better re-illustrate the point of "premature optimization" in anti-pattern


> _*Original whole texts*
> 
> one of the anti-patterns:_
> _Premature optimization: __Coding early-on for perceived efficiency, sacrificing good design, maintainability, and sometimes even real-world efficiency
> ...


_
_To reply this post I think I better re-illustrate the point of "premature optimization" in anti-pattern

I am not sure where you found my last statement
_note the following if you want a fast excel (I believe most people here don't care about it)
_
But anyway, there are two regards about the speed, one is those mentioned in the quoted article before, listed again here and more details:
Program optimization - Wikipedia, the free encyclopedia
this is what I correspond to the point that -- is faster than VALUE
As what I mentioned before about aim-matching


> Optimization can reduce readability and add code that is used only to improve the performance. *(-- is the case over VALUE, only to improve performance)*
> "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" *(How little speed improvement even for 3 million conversions)*
> This can result in a design that is not as clean as it could have been or code that is incorrect *(aim-matching is sacrificed)
> *A better approach is therefore to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized. *(Code from the design first, to change text to values, the aim-matching function is "VALUE()", then, check the benchmark to see which part should be optimized and how much it can be optimized. Again, the main calculation speed barrior should not be taken by this straight forward text-to-value conversion, but more to table structure design, lookups and checking equality and whether these are duplicated and redundant)*


so as above, to your question about what I said I cared for the speed,
I care about the latter part in the quote above:
"table structure design, lookups, checking equality and whether these are duplicated and redundant (DRY)"
and, these above, is just the key element in the speed of calculation

In short, it's totally two different prospective of speed. and please don't mix it up!


----------



## aaalviny (May 21, 2013)

Macropod said:


> aaalviny: Try to foist your highly subjective and highly opinionated approach in most places I'm familiar with and it's you who would be out of a job. Indeed, an employer who allows you to discriminate against those who don't conform to your ill-conceived standards would likely end up with you and your employer in court. All you've done in this thread is to prove beyond doubt your stupid refusal to accept that you are wrong.
> 
> There is a good reason why no-one here supports your stance. You are so far out of step with 'best practice' it isn't funny. Get the message or find a career better suited to your intransigence ...


Out of my expectation, I still find people are not thoroughly understanding the most important advantages I stated.
So I explained in further detail again, just the posts above.
If you think it's subjective, feel free to fight back. No one successfully fought back on those points yet.

And, please don't criticize my skills. My skills are well-recognized in my department, even though other colleagues are all advanced excel users who are very familiar with the functions and keyboard shortcuts. And I have already been conducting skills sharing sessions to my colleagues in MS Word and MS Excel and VBA.
for Excel, contents are about calculation flow design, table structure design, volatile functions avoidance, "DRY", benefits of auto calculation mode (illustrated best with no volatile functions), avoid isolated blocks, layout-logic dependency problem, etc.

for the final point, it's because very few people in the professional programming world are dealing with Excel exhaustively. So quite a number of best practice are not identified in Excel and very few people do mention them.
(I am not the first person to state the aim-matching problem of -- with VALUE())


----------



## aaalviny (May 21, 2013)

SydneyGeek said:


> Hang on. You use mega-workbooks, and claim no-one is interested in a fast Excel? Do you drink a lot of tea or coffee waiting for your models to recalc? Most of the people I work with want to get the job done. That means fast.
> 
> Denis
> [/COLOR]


First of all,
Mega workbook has no necessary relationship to speed.
If an excel contains many required data and complex calculation, the file size has to be big
Even if a workbook has small file size, there can be many volatile functions / duplicated calculations / array formulas which lead to slow worksheets.

then, when proper design are used (as I always mentioned, proper database structure, proper use of binary lookups, "DRY", no volatile functions, etc.)
the auto calc mode actually is working very well, and everytime I update the cells, only those depending on them are updated. So when I update some far away cells, no calculation is needed.)
at least, for my 200MB well-designed and the historic 80MB not well-designed worksheets, the 200MB one is faster than the 80MB.

I don't need to re-calc in my excel design, all in auto-calculation mode and there's no need to wait while I am working.
I don't need to press F9 when working, for your interest.


----------



## Jon von der Heyden (May 21, 2013)

As of my writing this, this thread has been viewed 896 times and has received 55 replies.  And despite all this exposure your notion has not received a single like and not single peep of support.  If you're right then we must all be very {fill_in_the_blank}.



aaalviny said:


> 1. indiscrimination:
> as I said, it is not sufficient


Well that;s it then.  If it's not sufficient then there is no such thing as more sufficient, since an explicit data type test will be required anyway.  One would be explicit in programming so why not with respect to formula creation?  Use of an insufficient function such as VALUE() only increases the risk.  So if it's not sufficient, and if data type is important and thus we must discriminate by using more explicit means then of course double negation is a fine choice!



aaalviny said:


> 2. aim-matching
> 1. figure out the need of a function, and define the aim


Ok so now you're suggesting that a function will always be the best match for the need.  Rubbish!  It is you who said we must look to humane logic and look beyond Excel.  Well ok then, lets do that!

So what does VALUE() mean outside of the Excel world?


			
				dictionary said:
			
		

> val·ue  [val-yoo]  Show IPA noun, verb, val·ued, val·u·ing.
> noun
> 1. relative worth, merit, or importance: the value of a college education; the value of a queen in chess.
> 2. monetary or material worth, as in commerce or trade: This piece of land has greatly increased in value.
> ...



Oh boy, nothing in there about coercion!

In fact if you have and programming experience you will know that it could mean ANYTHING!I mean what the algorithm is, what the function does, is to the discretion of the author!  I can think of dozen of other possible things that function, as a word, could support.  Can't you??

Ok so what about double negation?  So as you so rightly pointed out double negation negates a number (e.g. negate 1 = -1); and then negates it again (negate -1 =1).  So any human knows that the only type of information that double negation can yield is a number!  It's a lot more logical than VALUE(), isn't it?  You've argued the point about boolean logic, and that True cannot _really _be 1, and thatFalse cannot _really_ be 0.  Well I teach Excel and I teach the uses of coercion in formulae.  I always ask my audience to pick a number to represent True and to pick another to represent False.  I can safely say that EVERYONE has always picked True=1; False=0.  And I have asked A LOT of people!

But this IS Excel so lets puts our Excel hats back on.  We know that any arithmetic expression must yield a numeric result.  We know therefore Excel must attempt to interpret each operand as a numeric result.  We know that if it cannot resolve the operand it will yield a #VALUE! error, a very specific error that is aim-matching!  We know that double negation is quicker.  We know that it is indiscriminate; and that as VALUE() is only partially discriminate we must resort to a more explicit type test.  We know that everyone around the world, using *EXCEL *is using this method.

Alvin I'm going to put it very bluntly!  You do not possess the requisite experience nor knowledge of this topic to even begin suggesting changes to conventions.  These suggestions could adversely affect people, after all we many many millions of Excel users around the world with access to the internet.  On this matter you are wrong wrong wrong! Our conventions have long been around.  They have been challenged, debated and ratified!  Yes there are some that could do with improvement; but unless you open yourself up to the notion of LEARNING and spend some quality time on the forums (months if not years) then you will NEVER possess the requisite skills and knowledge to accurately challenge these.  LEARN the stuff buddy and then come back with a proper notion that might actually improve the way we use Excel.


----------



## aaalviny (May 21, 2013)

Jon von der Heyden said:


> As of my writing this, this thread has been viewed 896 times and has received 55 replies.  And despite all this exposure your notion has not received a single like and not single peep of support.  If you're right then we must all be very {fill_in_the_blank}.
> 
> 
> Well that;s it then.  If it's not sufficient then there is no such thing as more sufficient, since an explicit data type test will be required anyway.  One would be explicit in programming so why not with respect to formula creation?  Use of an insufficient function such as VALUE() only increases the risk.  So if it's not sufficient, and if data type is important and thus we must discriminate by using more explicit means then of course double negation is a fine choice!


both are not sufficient, but there is one which matches better and narrows down the scope better. That's it.
again, just a comparison of the two functions and their scope, please don't use the third way to intervene and say that the third way is the best and these two are just the same worse. It is out of logic when comparing which one of the two is better.


Jon von der Heyden said:


> Ok so now you're suggesting that a function will always be the best match for the need.  Rubbish!  It is you who said we must look to humane logic and look beyond Excel.  Well ok then, lets do that!
> 
> So what does VALUE() mean outside of the Excel world?
> 
> Oh boy, nothing in there about coercion!


I have to say, function names are not exactly English, yet they have their names giving clues to what the function is about:
and I believe you are checking the dictionary here:
Value | Define Value at Dictionary.com
please scroll down to "Collins" and locate below:


> 6._maths_a. a particular magnitude, number, or amount: _the value of the variable was 7_b. the particular quantity that is the result of applying a function or operation for some given argument: _the value of the function for x=3 was 9
> _
> 
> <tbody>
> ...


it means to change the parameter inside the brackets to a value!

Or you can say it's not describing the exact clue? but it is not English sentence and "VALUE", by the definition above, should be already enough to identify the aim.
If you say by this definition VALUE cannot represent its aim, then I can nearly say: no function names can be used to represent its aim, "INDEX", "OFFSET", "VLOOKUP", "N", etc. all don't mean the true thing




Jon von der Heyden said:


> In fact if you have and programming experience you will know that it could mean ANYTHING!I mean what the algorithm is, what the function does, is to the discretion of the author!  I can think of dozen of other possible things that function, as a word, could support.  Can't you??


at least I won the preliminary selection contest in programming and was in the about-30-people regular training for grabbing the chance to join International Olympiad in Informatics.
Algorithm can mean anything, but many matters are concerned when designing.
Speed: whether the algorithm is O(1), O, O(log N), etc.
perfectness: whether the algorithm can deal with all special cases
Memory consumption
Readability and Maintenance ease
and also whether it gives side effect or unwanted outputs,
also about "DRY" violation, (yes this principle is taken from programming, not excel)
etc.
It can be anything but there are comparisons!
just like here, there's difference between -- and VALUE
and of course, you have freedom if you are working solely on your own without considering others.
BUT, here definitely not!
[/QUOTE]


Jon von der Heyden said:


> Ok so what about double negation?  So as you so rightly pointed out double negation negates a number (e.g. negate 1 = -1); and then negates it again (negate -1 =1).  So any human knows that the only type of information that double negation can yield is a number!  It's a lot more logical than VALUE(), isn't it?  You've argued the point about boolean logic, and that True cannot _really _be 1, and thatFalse cannot _really_ be 0.  Well I teach Excel and I teach the uses of coercion in formulae.  I always ask my audience to pick a number to represent True and to pick another to represent False.  I can safely say that EVERYONE has always picked True=1; False=0.  And I have asked A LOT of people!


you are only concerning the output AGAIN!
of course double negation can only produce numbers as understood by all
BUT, no one knows even texts can be processed with arithmetic!

There is a wrong assumption in the question at the beginning:
you forced others to select a number to represent True / False
which you assumed that a number can represent True / False in the underlying assumption! (and to the other side, True/False can represent a number?)
but the fact is, True / False are not numbers!
I understand you do it so as to teach Excel's logic, but please don't take it simply due to your pre-set assumption!
(as topics of interest, if you are designing a questionnaire, you have to avoid this prescribed situation!)



Jon von der Heyden said:


> But this IS Excel so lets puts our Excel hats back on.  We know that any arithmetic expression must yield a numeric result.  We know therefore Excel must attempt to interpret each operand as a numeric result.  We know that if it cannot resolve the operand it will yield a #VALUE! error, a very specific error that is aim-matching!  We know that double negation is quicker.  We know that it is indiscriminate; and that as VALUE() is only partially discriminate we must resort to a more explicit type test.  We know that everyone around the world, using *EXCEL *is using this method.


Yes, this is Excel, but no matter what Excel allows you to do, think if it's reasonable.
I don't mention again that you emphasize the output of double negation
here at least you mentioned about the input of double negation
but please don't talk about aim-matching regarding to error only! (you are mixing indiscriminate with aim-matching here! error = due to indiscriminate, but then you talk about aim-matching)
my aim-matching does not mean this,
please revisit my statement about the main aim of negative & the side effects
and don't skip away from it if you want to beat me in this point 




Jon von der Heyden said:


> Alvin I'm going to put it very bluntly!  You do not possess the requisite experience nor knowledge of this topic to even begin suggesting changes to conventions.  These suggestions could adversely affect people, after all we many many millions of Excel users around the world with access to the internet.  On this matter you are wrong wrong wrong! Our conventions have long been around.  They have been challenged, debated and ratified!  Yes there are some that could do with improvement; but unless you open yourself up to the notion of LEARNING and spend some quality time on the forums (months if not years) then you will NEVER possess the requisite skills and knowledge to accurately challenge these.  LEARN the stuff buddy and then come back with a proper notion that might actually improve the way we use Excel.


see what my statements above mean
As to say, I am waiting for you to beat my points above if you could.
I think it's you to learn at this stage.

From the beginning I have been emphasizing on these two points!


BTW, I don't think answering questions here would give much use to Excel design knowledge, mostly the topics here are requiring one-cell solution which wouldn't help in designing. and topics here only require the result.
or what else I can see is the complicated long formulas and array formulas.
the main aim here is to help people achieve their end result. So in contrast, it's of little benefit if one hopes to learn from here about best practice and design

i.e. technologically it is ok here (the use of functions and array formulas, the complexity is wonderful), but not on the design side.


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## aaalviny (May 21, 2013)

To further supplement: "Algorithem efficiency"



> [h=3]Readability, trade offs and trends [edit][/h]One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand. There are exceptions to this 'rule' (such as embedded systems, where space is tight, and processing power minimal) but these are rarer than one might expect.
> However, increasingly, for many 'time critical' applications such as air line reservation systems, point-of-sale applications, ATMs (cash-point machines), Airline Guidance systems,Collision avoidance systems and numerous modern web based applications—operating in a real-time environment where speed of response is fundamental—there is little alternative.



Algorithmic efficiency - Wikipedia, the free encyclopedia

the good word it uses:
not over-emphasizing efficiency, Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand.


----------



## SydneyGeek (May 21, 2013)

aaalviny said:


> First of all,
> Mega workbook has no necessary relationship to speed.
> If an excel contains many required data and complex calculation, the file size has to be big
> Even if a workbook has small file size, there can be many volatile functions / duplicated calculations / array formulas which lead to slow worksheets.
> ...



OK, your workbook works for you. I currently use workbooks built using a 'Best Practice Modelling' add-in and I hate them, because they have taken their idea of best practice to the extreme. This comes down, effectively, to only doing one kind of operation in a worksheet, which creates huge numbers of sheets for no good reason that I can fathom.
My point is that you appear to be stuck on a particular track, and want the whole MrExcel community to follow your path because it's best practice. Some very experienced people have explained why the conventions exist; you have acknowledged that there are advantages to them. So, will you learn from this interchange or will you continue to argue your point on ever-smaller pieces of ground?

Denis


----------



## shg (May 21, 2013)

> or will you continue to argue your point on ever-smaller pieces of ground?


Nice turn of phrase, Denis, and on mark.


----------



## xenou (May 21, 2013)

Hi,

Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result".  In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations.  To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.

The whole thing is somewhat minor and therefore unimportant - anyone should use -- or Value() as they please without concern.  From my experience the conventions used are _double negation_ (the majority), _add zero_ (second place), and all other techniques, including VALUE (all rare).  There's really no point to making a big deal out of something that is neither a "best practice" (defined as that which the high performing spreadsheet designers do), nor very important.  Many of your points are fine, but overkill - nobody is arguing that we produce spreadsheets that are unmaintainable, confusing, or error-prone.  But it is really stretching the argument beyond the breakning point to suggest that one formula using --() creates all of these problems.

To the latter, I'm afraid that in my experience no matter how simple and "maintainable" I make a spreadsheet, 90% of users can't grasp how it works.  This could be only because it contains an IF() formula, a VLOOKUP(), or even because it uses input cells (another "best practice").  So when I make spreadsheets that others must use and maintain, I use all of my expertise to make something that works, even when dummies take it over - with the assumption that they don't understand it (and may not be able to).

ξ


----------



## xenou (May 21, 2013)

^^ to follow up, I have a book on Excel best practices which advocates strongly (and convincingly, in the event) that using R1C1 formula style is an Excel best practice.  Of course, almost no one I know actually follows this best practice.  So I guess that's a case of a "best practice" that is ignored, and could be argued until next Tuesday without effect, if that's any consolation to you.

Note: the book in question is _Excel Best Practices for Business_ by Abdulezer.  However, I do occasionally put Excel in R1C1 mode temporarily to review a worksheet for possible errors (!).  I'm almost willing to go with this permanently but I know it would just confuse the heck out of everyone else, even if it is best.


----------



## aaalviny (May 22, 2013)

SydneyGeek said:


> OK, your workbook works for you. I currently use workbooks built using a 'Best Practice Modelling' add-in and I hate them, because they have taken their idea of best practice to the extreme. This comes down, effectively, to only doing one kind of operation in a worksheet, which creates huge numbers of sheets for no good reason that I can fathom.
> My point is that you appear to be stuck on a particular track, and want the whole MrExcel community to follow your path because it's best practice. Some very experienced people have explained why the conventions exist; you have acknowledged that there are advantages to them. So, will you learn from this interchange or will you continue to argue your point on ever-smaller pieces of ground?
> 
> Denis


You are quoting the things that are not related to the topic though I know it's me who responded to others' critics on me.
I am not sure what your best practice modelling add-in is, so I guess I won't comment on it unless you give further details.

Again, I can't force anyone, I am just presenting my advice.

whether it is "smaller" pieces of ground can not be measured,
and I have to say people here really don't value the "aim-matching" principle as important.
and I would say claiming efficient and fast is only an "over-emphasis on efficiency" but sacrifice design

So that's why people here are thinking -- as correct. and think what I am arguing is a small piece.
And I believe I finally have presented my points clear enough at this moment.

That's all I want to say


----------



## aaalviny (May 22, 2013)

xenou said:


> Hi,
> 
> Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result".  In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations.  To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.


Hi,

I doubt you are only viewing the aim of your task and the end result to claim that it is aim-matching...
please consider the aim-matching of the function.




xenou said:


> The whole thing is somewhat minor and therefore unimportant - anyone should use -- or Value() as they please without concern.  From my experience the conventions used are _double negation_ (the majority), _add zero_ (second place), and all other techniques, including VALUE (all rare).  There's really no point to making a big deal out of something that is neither a "best practice" (defined as that which the high performing spreadsheet designers do), nor very important.  Many of your points are fine, but overkill - nobody is arguing that we produce spreadsheets that are unmaintainable, confusing, or error-prone.  But it is really stretching the argument beyond the breakning point to suggest that one formula using --() creates all of these problems.


Again, that's because you value it as unimportant so other people, like you, are disproving my points.
It's important to raise the efficiency by 80%, but just too minor if to raise the efficiency by 1% and has to sacrifice the design
this is kind of over-emphasis on efficiency.

Again because many people are under-valuing this principle, the conventions become to use --

and I have to make clear that, "high performing spreadsheet designers" means both design and efficiency are good, but not only on the performance of speed.
As Best Practice does not only mean the efficiency.

I am not sure whether your final statements are talking about SUMPRODUCT or --.
I only state the problem of -- being indiscrimate and not aim-matching.
[/QUOTE]


xenou said:


> To the latter, I'm afraid that in my experience no matter how simple and "maintainable" I make a spreadsheet, 90% of users can't grasp how it works.  This could be only because it contains an IF() formula, a VLOOKUP(), or even because it uses input cells (another "best practice").  So when I make spreadsheets that others must use and maintain, I use all of my expertise to make something that works, even when dummies take it over - with the assumption that they don't understand it (and may not be able to).


so your users seem to be very elementary about excel? if yes then I would rather say it's not applicable for us to talk about the design and readability of excel to them, as they don't have the skills to do this yet. (given they may have no knowledge to IF / VLOOKUP..they are really very elementary...)
Here, things are talking from the developer's prospective.


xenou said:


> ^^ to follow up, I have a book on Excel best practices which advocates strongly (and convincingly, in the event) that using R1C1 formula style is an Excel best practice. Of course, almost no one I know actually follows this best practice. So I guess that's a case of a "best practice" that is ignored, and could be argued until next Tuesday without effect, if that's any consolation to you.
> 
> Note: the book in question is _Excel Best Practices for Business by Abdulezer. However, I do occasionally put Excel in R1C1 mode temporarily to review a worksheet for possible errors (!). I'm almost willing to go with this permanently but I know it would just confuse the heck out of everyone else, even if it is best._


I am holding against side to R1C1, as one quick concern is the readability of the reference, for instance
a cell B5 contains the formula =RC[50]
how long it takes to figure out what it is?
(similar to the case to use VLOOKUP(,,100, 0)
And can you explain the benefit of using R1C1 mode by starting a new thread? I hope to have a quick look on it


----------



## xenou (May 22, 2013)

Hi aalviny,


> and I have to say people here really don't value the "aim-matching" principle as important.


I don't think this is quite true.  It's just the the aim-matching principle is to vague.  No one really wants to produce work that doesn't match their aim.  I think the disagreement is more that the majority of users of double negation find it to be sufficient so far as both getting results and being understandable/readable.  You are in the minority on that point.


> and I would say claiming efficient and fast is only an "over-emphasis on efficiency" but sacrifice design



I'm not sure what you mean here, but there are also different opinions on efficiency/speed/optimization/file size.  Some are very concerned about these things, others a little concerned, and others hardly worry about it at all (unless it actually becomes a problem - files too big or taking too long to calculate).  I suppose it's good to be proactive by designing efficiency in from the start, but be that as it may it's the case the some folks will only get to that when they need to.  Again, I doubt anyone wants to create inefficient spreadsheets.  It's just a matter of how much you know, and how much you want to put into it.


----------



## xenou (May 22, 2013)

I would refer you to the book quoted to get a complete case of the benefits of R1C1 formulas.  I'm not a big user of this, as I said, so I'm not the one to start a thread on it.  One of the benefits is that when you write a formula and drag it down or across, it looks exactly the same in every cell.  So you can quickly see anomalies (this is why I use it for "debugging").  Of course at first it's hard to understand the references, but in time I imagine you'd become used to it.

I mentioned users because you have emphasized this matter (unless I am mistaken) -- that other people might need to take over the spreadsheet and that's why you take pains to make them easy to understand.  Similarly with Sumproduct, I mentioned this because it is a prime example of where double negation is employed, and double negation is the thing you seem to object too on the grounds of your "aim-matching" principle.

ξ


----------



## aaalviny (May 22, 2013)

xenou said:


> Hi aalviny,
> 
> I don't think this is quite true.  It's just the the aim-matching principle is to vague.  No one really wants to produce work that doesn't match their aim.  I think the disagreement is more that the majority of users of double negation find it to be sufficient so far as both getting results and being understandable/readable.  You are in the minority on that point.


Hi,
It seems you are still concerning about the aim-matching of the task and result, but not the aim-matching of the functions used. (Or you haven't seen my replies above?)
understandable, readable and aim-matching are important during the choice of functions


xenou said:


> I'm not sure what you mean here, but there are also different opinions on efficiency/speed/optimization/file size.  Some are very concerned about these things, others a little concerned, and others hardly worry about it at all (unless it actually becomes a problem - files too big or taking too long to calculate).  I suppose it's good to be proactive by designing efficiency in from the start, but be that as it may it's the case the some folks will only get to that when they need to.  Again, I doubt anyone wants to create inefficient spreadsheets.  It's just a matter of how much you know, and how much you want to put into it.


I am meaning the fact that -- is a little faster than VALUE, if you read through the previous replies.
comparing to the importance of DRY, using binary lookups, good table structure, no volatile functions, etc. that can give significant improvement to the speed.
they are just like difference of raising the efficiency from 50% to 90% and 90% to 91%
to do this small efficiency, it is ok as long as the design is good, e.g. keeping aim-matching, understandable and readable.
otherwise it is over-emphasis on efficiency. (actually even it's ok, the time spent to improve this efficiency may just cancel out the efficiency improvement)

Alvin


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## aaalviny (May 22, 2013)

xenou said:


> I would refer you to the book quoted to get a complete case of the benefits of R1C1 formulas.  I'm not a big user of this, as I said, so I'm not the one to start a thread on it.  One of the benefits is that when you write a formula and drag it down or across, it looks exactly the same in every cell.  So you can quickly see anomalies (this is why I use it for "debugging").  Of course at first it's hard to understand the references, but in time I imagine you'd become used to it.


Hi ξ,
I would say using different formulas, esp. in the middle of a column is one of the bad practice and should be avoided, manual adjustment should be done on new columns. (seems better not go off-topic)


xenou said:


> I mentioned users because you have emphasized this matter (unless I am mistaken) -- that other people might need to take over the spreadsheet and that's why you take pains to make them easy to understand.


I wonder it is "users" or "developer" (i.e. people who maintain) you are referring to?


----------



## xenou (May 22, 2013)

So, are you saying that you prefer VALUE() because it is more efficient or because it is better at aim-matching?
ξ

Note, for the record I think double negation matches the task and the result, and the "function used", so to speak.  I.e., it's all good.


----------



## aaalviny (May 22, 2013)

xenou said:


> So, are you saying that you prefer VALUE() because it is more efficient or because it is better at aim-matching?
> ξ
> 
> Note, for the record I think double negation matches the task and the result, and the "function used", so to speak. I.e., it's all good.


better at aim-matching (most importantly)
also better discrimination on the input of the function

VALUE() is little worse in terms of efficiency.

Overall I weigh aim-matching most (of course also readable and understandable, etc.), and think that efficiency only does little improvement

and from the statement from wiki
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
""Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing."
AND
"One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand."

And...can you explain why aim-matching is valid for the function "arithmetic negative" to do format conversion?
I presented my detailed argument in #55


----------



## shg (May 22, 2013)

aaalviny said:


> That's all I want to say


I was so optimistic after reading that ...


----------



## aaalviny (May 22, 2013)

shg said:


> I was so optimistic after reading that ...


I hoped no need to say anymore,
But it shows me that people who reply here just don't understand my viewpoints...and give comments that I already responded, but without giving a response to what I responded.
how sad it is.


----------



## xenou (May 22, 2013)

Hi, I was trying to point out that people understand your viewpoint but just don't agree on all that you are saying, which isn't the same thing.  Hopefully that will make you a little less sad 




> And...can you explain why aim-matching is valid for the function "arithmetic negative" to do format conversion?
> I presented my detailed argument in #55


I presented my argument (less detailed) in #64, first paragraph.


----------



## aaalviny (May 22, 2013)

xenou said:


> Hi, I was trying to point out that people understand your viewpoint but just don't agree on all that you are saying, which isn't the same thing.  Hopefully that will make you a little less sad
> 
> 
> 
> I presented my argument (less detailed) in #64, first paragraph.


I have already pointed out that you mentioned only the aim of the task and the end-result but not the aim-matching of the functions

to re-quote yours:


xenou said:


> Hi,
> 
> Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result". In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations. To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.


"I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result"."

let me illustrate an example

I want to compute 4!
I entered (2*3*4)
it gives the correct result, but not the correct definition, which is 1*2*3*4
mathematical operation that produces a correct result <= you only considered the end result
is by definition a match between the "aim" and the "result" <=aim of the task and the end result

so I can't find the aim-matching of the functions used in your above statement

"""2*3*4 is a mathematical operation that produces a correct result of 4!
and is by definition a match between "the aim to get the 4! result" and the "result""""

I do think it's a problematic statement above to respond to my "aim-matching of the function used"


----------



## xenou (May 22, 2013)

Hi,
So you are arguing that we should never write _4!_ but always _2*3*4_ instead (or _1*2*3*4_, as the case may be)?

I think that _4!_ matches the aim of finding the factorial of 4.
ξ


----------



## aaalviny (May 22, 2013)

xenou said:


> Hi,
> So you are arguing that we should never write _4!_ but always _2*3*4_ instead (or _1*2*3*4_, as the case may be)?
> 
> I think that _4!_ matches the aim of finding the factorial of 4.
> ξ



I mean 4! and 1*2*3*4 are the same things
but 2*3*4 is not a correct representation of 4! although they give the same correct answer


----------



## xenou (May 22, 2013)

Hi,
That may be so, but I will still use 2*3*4 in cases where I am evaluating 4!, since it simplifies the work of solving equations - and yet I find myself able to sleep at night 
ξ


----------



## aaalviny (May 17, 2013)

Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






*Text to Number Conversion - Which functions should be used?*


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:

```
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
```


```
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
```


```
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)
```


and just a random search on the threads:

```
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
```
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




*Conclusion*
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18


----------



## aaalviny (May 22, 2013)

xenou said:


> Hi,
> That may be so, but I will still use 2*3*4 in cases where I am evaluating 4!, since it simplifies the work of solving equations - and yet I find myself able to sleep at night
> ξ


Hi ξ,

oh sadly, still on solving equations' efficiency instead of the meaning of 4!

4! = 1*2*3*4 = 2*3*4 then it's ok to simplify the work
(just like when we are working mathematics question in school and teacher just won't allow one to write 4! = 2*3*4)
or when substituting formulas with values, "1" has to be written in formula e.g. P * (1 + i) = 1 * (1 + 5%), when the question set P = 1 and i = 5%

Alvin


----------



## xenou (May 22, 2013)

I guess I don't have as much difficulty as you do in understanding the meaning of 4!   Nevertheless, using * symbols in your "representation" (1*2*3*4) is not sufficient since it doesn't cover the cases of 0!, which must be represented as _1_, or 1!, which must be represented as _1_.  It is also going to be very awkward for very large numbers (where your representation would become excessively long to write out).

P.S., 1 as a factor is not the same as 1 as an addend.


----------



## Marcelo Branco (May 22, 2013)

Alvin,

Conciliation time!

I think that after so many messages we come to a point that you should stop and reflect.

I would like you to consider:

1. You stated _ad nauseam_ your idea about best practices and received several responses contrary to your point of view.
For sure they thus expressed because this forum is considered one of the best about Excel, not because they like or dislike you. Leave unanswered a proposal that does not seem correct for vast majority of the most active and recognized participants, would betray the purpose of this forum and confidence of all who come here for help.

2. No one alone can define what is, or is not, a best practice. 
It is the community of users, be it Excel, any tool or programming language, generally adopting a solution according to the majority, that can and shall suggest such practices - attention: suggest, not define.

3. There are many people who use this forum, facing real problems, eager to be helped.
How, in your argument, you demonstrated a very good knowledge of Excel, i think you could be very helpful trying to help those people, instead of insisting on a point that not found support among the best professionals who participate in this forum and, as a matter of fact, is not so important. Do not waste your and our time.

Please, think about it

All the best

M.


----------

