Text to Number Conversion - Which functions should be used?

aaalviny

Board Regular
Joined
Apr 25, 2013
Messages
128
Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






Text to Number Conversion - Which functions should be used?


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:
Code:
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
Code:
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
Code:
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)


and just a random search on the threads:
Code:
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




Conclusion
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)

Without using any of the new functions in XL2007 or higher or VBA, please show a sample solution to this problem:

I want to sum the values in Column F, where Column A equals Yes and Column D equals NO.

You are also making an assumption that coercion is only needed when values are "wrongly formatted"
 
Last edited:
Upvote 0
Without using any of the new functions in XL2007 or higher or VBA, please show a sample solution to this problem:

I want to sum the values in Column F, where Column A equals Yes and Column D equals NO
Do you mean to sum values in Column F if Column A is "Yes" and Column D is "No"?

I mean if there are large amount of coercion needed, it is probably due to the wrongly formatted raw data
 
Last edited:
Upvote 0
alvin_excel_no_sumproduct_20130518.PNG
 
Upvote 0
Why would you use 2 SUMPRODUCT? You only need 1. You have used 41 formulas to avoid 1.
 
Last edited:
Upvote 0
You have also used 13 regular math functions which you claim are volatile, which contradicts your statement that these formulas should be avoided.
 
Upvote 0
Why would you use 2 SUMPRODUCT? You only need 1. You have used 41 formulas to avoid 1.
I am saying If there are another request to sum those Not "Yes" or Not "No"
then there will be totally two sumproduct
that 41 formulas don't matter, as long as they are easy to read and maintain
(of course I can combine Column C and D to column B, to reduce the number of formulas required, but

imagine, when there are now five values of interest:
1. sum of col F if "Yes" and "No"
2. sum of col F if Not "Yes" or Not "No"
3. sum of col F if "Yes"
4. sum of col F if "No"
5. sum of col F if Not "Yes" and Not "No"

If now, the data changes, "Yes" is no longer "Yes" and becomes "Y"
if you are using sumproduct, you have to change five times in the formula (one in each sumproduct)
but if my suggestion is used, only one-time change is needed then formulas are copied down

this is because of the DRY (Don't repeat yourself) principle is violated in the sumproduct cells
the "Yes" column or "No" column are needed to be thoroughly checked for each sumproduct cells, which are repeated
and that's why it's slow because excel needs to check the same array for multiple times

this is about the future maintenance of the worksheet (future is always unpredictable)
You have also used 13 regular math functions which you claim are volatile, which contradicts your statement that these formulas should be avoided.
I am sorry but what are the volatile functions you are referring to?
 
Upvote 0
that 41 formulas don't matter, as long as they are easy to read and maintain
What a load of crap!!!

You clutter the worksheet with loads of completely unnecessary formulae that increase:
(a) maintenance overheads;
(b) the risk of maintenance error;
(c) file size; and
(d) time required for a recalc by adding an extra output that wasn't asked for,
then have the gall to suggest it doesn't matter??? What planet are you from???
 
Upvote 0
Well I said I wouldn't debate this; but for the sake of the readers:

aaalviny said:
1. -- or INT() or VALUE() matches the aim of the formula?
Indeed. Well actually, VALUE() does. INT() is intended to truncate a decimal value to a whole number. But I suggest that that a meaningless mathematical operation using an arithmetic operator matches the aim of the formula too. We know that, in Excel, whenever an expression includes an arithmetic operator (--;+
;-;*;/;%;^) Excel attempts to resolve each operand as a numeric value (i.e. coercion is invoked). Therefore it is by design and it "matches the aim of the formula". Understanding this holds no further complexity of understanding the purpose of the VALUE() function.

aaalviny said:
2. -- is simpler than VALUE()
I would say on par. Any user unaware of coercion has to learn it. Learning what VALUE() does is no simpler nor more complex than understanding that use of arithmetic operator also invokes a coercion attempt. Besides it's not rocket science and it's really easy to read up on and to understand. Yes we occasionally see people ask what is the purpose of -- but I believe what is being asked is the context (after all everyone knows --1=1). People ask why it is necessary to invoke coercion. If we start substituting -- for VALUE, I believe we will have equally as many questions asking why it is necessary.


aaalviny said:
3. Convention is to use --?
Convention is important. It ensures consistency in how we work. It means that when you evaluate a complex formula and see -- you know immediately a deliberate attempt to coerce a text value to a numeric value. Indeed there are some conventions that aren't as good as they should be. This isn't one of them!

aaalviny said:
4. VALUE() requires one more parenthesis than --
Yes it's a trivial point but most of our conventions are trivial. It is exactly all these trivial points though that when adhered to in a complex model makes all the difference. I for one don't want the extra parenthesis even if only for formula evaluation. I'm a big fan of analysing formulae with F9 over certain portions. Extra parenthesis only makes that process that little more tricky. But yes, this is a trivial point.

aaalviny said:
5. -- is faster than VALUE?
Oh yes it is! The double negation operation invokes VERY little overhead, while the VALUE() function call by comparison to -- involves considerably more work. Is it noticeable? Not unless you scale it up. But unnecessary function calls is a significant culprit behind slow calcing models. Understanding coercion and boolean logic and avoiding unnecessary function calls can make a noticeable difference in the calc speed of your workbooks. What we are talking about here is micro-optimization; but micro-optimization is indeed very important. Use of all the micro-optimization best practises is what makes the difference between a 10MB file and a 1MB file. I once inherited a spreadsheet that was over 20MB. By the time I removed some duplicate calculations, substituted linear lookups for binary lookups, applied boolean logic using digital values I reduced the file size down below 1MB. All the improvements were micro-optimization techniques. So use of -- (or any arithmetic operator) over a function is important.

aaalviny said:
6. -- is shorter in terms of length than VALUE()?
Well I suppose. But that's not our motivation to use it. Not sure where you picked that argument up from Alvin.

aaalviny said:
Conclusion
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice

Please don't! You will be breaking away from convention and you will add more overhead to your formulae. This is a very bad suggestion. I agree that there are some stupid conventions. This is not one of them!

Interestingly one may ask why -- is the preferred (or most common) method? I believe it is because double negation comes 1st in {arithmetic} order precedence relations. I recall reading a suggestion how this is a teeny tiny bit quicker than using an operator lower down the list of order precedence; but I cannot remember the logic to that claim. Still, everyone seems to use --, it strikes me as being a very well-considered convention so I'm going to stick with it. Be good to yourself and do the same. ;-)
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top