The Array Type

Colin Legg · Aug 29, 2008

Hi Everyone,

I'm going to start this thread with a pretty broad question...

Does anyone know where I can find some documentation or information (exluding the Excel helpfile) about Excel's Array Type? There seems to be an inexplicable lack of information about it!

To be clear, when I say "array type", I mean the data type that the TYPE() worksheet function returns a value of 64 on.

Thanks
Colin

tusharm · Aug 30, 2008

I don't know why this matters. It is so completely irrelevant. Among the several "high reliability, high performance" projects I've worked on, one was a large (multi-mainframe) project where nothing -- and I mean nothing short of a simultaneous multi-mainframe failure) could compromise my security kernel code while adding no more than a few milliseconds of overhead to overall transaction durations. And, given my *practical* experience in high performance projects, I can assure you I neither know nor do I care about stuff like where Excel stores a constant. Nor do I care about what excites so many others -- though, not the topic of this discussion {grin} -- as to which data type is faster than another.

These kinds of what I call micro-optimizations will *at the most* gain you a few percentage points. It won't bring a magnitude improvement in performance. Nor will they help you develop robust, reliable, understandable, and easy to maintain systems.

Colin_L said:
Hi,

Thanks for your replies. Yeah, xld, that's a fair enought point - I'll start being a little bit more specific, but rather than jumping in the deep end I'll try to build up.

In the introduction paragraph in this primer on array constants, J-Walk states that array constants are stored within the memory rather than within the workbook. This surprises me. Do we know if this is fact (and how do we know this!?), do we know the type of structure that is used in the memory to store them, and do we know if named array constants are stored differently to [non-named] array constants?

Cheers,
Colin

NateO · Aug 30, 2008

I'll give my two cents here on what J-Walk's getting at, and this is my unofficial interpretation.

Excel stores variables like this in memory, and it travels with the Workbook, i.e., open the Workbook and it's in memory, close the Workbook, it's out of memory. But it's committed at the Application level.

In a certain sense, this used to be less trivial than it is today, with memory-caps, and whatnot. If you open a 2nd instance of Excel, your memory-slate is clear, regardless of what's happening with the initial instance.

The same would be true where you use a VLookUp() call against a Range. You might think you have a Range burned into memory, but Excel's actually housing your Range reference as an Array, in memory.

NateO · Aug 30, 2008

You also might want to search XLOper over at MSDN, it pulls up references like this one:

http://msdn.microsoft.com/en-us/library/bb687869.aspx

Colin Legg · Aug 31, 2008

Hi everyone,
Thanks for your replies.
Tushar - I assure you I'm not heading down the efficiency road with this!

Nate - Fully recoverd from your birthday hangover, at last?!

Array constants and named array constants seem to be interpretted differently when they are passed to and from the worksheet. I think this is to do with how they are structured within the memory. Excel is a black box and I am guessing, so I could most certainly be wrong.

If we set up a named array constant, which is really just a named formula, in the workbook:

Named array constant: My_Array ={3,5}

Firstly, it seems that a named array constant is NOT an array type.
We can see this with a couple of formulas, entered normally:

=TYPE({3,5}) returns 64 --> array type
=TYPE(My_Array) returns 1 --> not an array type. Returns 1 because the first element of My_Array is a double.

Simple as this is, as far as I am aware - and I have looked around - this is not actually documented anywhere? Maybe I'm missing something here?

Does this matter? Yes, it does - to me, at least. It means that calculations on array constants and named array constants work differently - they need to be treated as two separate animals.

=SUM({3,5}+1) returns 10
=SUM(My_Array+1) returns 4
=SUM({3,5})+1 returns 9
=SUM(My_Array) +1 returns 9

Evaluating the formulas 1 and 2 with F9 shows exactly the same thing, but the results are different? It seems that this is because only the first element of the named array is passed back for SUM to aggregate?

This is further supported by what happens if we deliberately convert My_Array into an array type. It seems there are three ways we can do this.
1. Use CTRL+SHIFT+ENTER:

=SUM(My_Array+1) returns 10 when CTRL+SHIFT+ENTERED

2. Use Sumproduct, which convertes its arguments into array types:

=SUMPRODUCT(My_Array+1) returns 10

3. Use the N() worksheet function:

=SUM(N(My_Array)+1) returns 10

Now the third one, using the N() worksheet function, is bizarre - to me, at least. Again, I can't find it documented anywhere that the N() worksheet function can do this? Perhaps it has a use after all!?

So, getting back to array constants and the memory. Looking at these results, it would make sense to me that a named array constant has to pass through a particular layer of interpretation that an array constant doesn't have to; or, at least to say that they are dimensioned somewhat differently in the memory?

The three ways we can induce Excel to calculate on array constants and named array constants in the same way suggests to me that when we CTRL+SHIFT+ENTER a formula, we are just converting the relevant parts, such as range reference values, of that formula into array types. So, in turn, an array formula should simply be defined as a formula that contains an array type. This means that array formulas are not just CSE formulas, and any formula that uses Sumproduct, an array constant, or another array type construct, is in fact an array formula. A formula that contains a named array constant is not an array formula unless there is an array type conversion in there. Again, looking around the internet, there are plenty of array formula tutorials that describe array formulas and what they do, but none of them seem to specifically define what an array formula is.

All thoughts, possibilities and suggestions are welcome?

Cheers,
Colin

xld · Aug 31, 2008

What you seem to be missing to me is that names are evaluated whenever they are used. This means that array constants that are use in a non-array context, such as TYPE and name+1 picks the first item.

But you can force it to be properly evaluated if you array enter it, so {TYPE(My_Array)} returns 64 just as TYPE({3,5}) returns 64, that is you are telling TYPE To process an array as an array. Similalrly, SUM(My_Type+1) resolves to SUM(3+1), but SUM({4,6}) if array-entered.

So, as I keep re-iterating, array constants are array constants, they are very simple, and you are over-thinking it IMO.

Colin Legg · Aug 31, 2008

xld said:
What you seem to be missing to me is that names are evaluated whenever they are used. This means that array constants that are use in a non-array context, such as TYPE and name+1 picks the first item.

But you can force it to be properly evaluated if you array enter it, so {TYPE(My_Array)} returns 64 just as TYPE({3,5}) returns 64, that is you are telling TYPE To process an array as an array. Similalrly, SUM(My_Type+1) resolves to SUM(3+1), but SUM({4,6}) if array-entered.

Hi xld,

I'm not missing that - honest. And I actually think we're kind of agreeing here - with an array constant we can put it in a cell and it is processed as an array type. But with a named array constant, we have to tell Excel to process it like an array type. So clearly there is a difference there and I cannot find any documentation which actually says this. And I think it is interesting that using N() does this successfully for a named array constant - again I can find no information on this.

So, as I keep re-iterating, array constants are array constants, they are very simple, and you are over-thinking it IMO.

I could well be over-thinking it. I've been banging my head against a brick wall about this ever since I wrote an array formula tutorial a little while back. Well, I never answered it myself and I was pretty sure that if there is an answer then the members on here would be able to come up with it - or at least some seriously good ideas! I do think it is a valid observation and I can't see any harm in asking the questions, especially since these observations do not appear to have been discussed on the forums before... at the very least I am going to learn something, right?

And thank you for taking the time to share your opinion and thoughts with me - I appreciate it and I have taken it on board.

Cheers,
Colin

xld · Aug 31, 2008

I think it is not that we have with a named array constant, we have to tell Excel to process it as an array. When it is an in-line array constant, such as SUM({3,5}) Excel can work that out itself. When it is a named array constant, because the name is evaluated on use, Excel doesn't know that SUM(My_Array) is an array, so it has to be informed by array entering the formula.

Aladin Akyurek · Aug 31, 2008

Colin,

Up to now I hadn't looked in this thread...

The points you rise and observations you make are interesting and valid.

Colin_L said:
...

Array constants and named array constants seem to be interpretted differently when they are passed to and from the worksheet. I think this is to do with how they are structured within the memory. Excel is a black box and I am guessing, so I could most certainly be wrong.

Everything is in memory, even the objects that are anchored to "cells" which define spreadsheets.

If we set up a named array constant, which is really just a named formula, in the workbook:

Named array constant: My_Array ={3,5}

Firstly, it seems that a named array constant is NOT an array type.
We can see this with a couple of formulas, entered normally:

=TYPE({3,5}) returns 64 --> array type
=TYPE(My_Array) returns 1 --> not an array type. Returns 1 because the first element of My_Array is a double.

Simple as this is, as far as I am aware - and I have looked around - this is not actually documented anywhere? Maybe I'm missing something here?

While =TYPE({3,5}) is anchored to the cell where this bit is entered, {3,5} itself is not. TYPE recognizes thus the {3,5} bit as array object. Hence 64.

But when you enter that bit into a cell as ={3,5} or as =My_Array, TYPE sees just the first item in both cases. Hence, both evaluate to 1.

The following are equivalent:

=TYPE({3,5})

{=TYPE(My_Array)}

Does this matter? Yes, it does - to me, at least. It means that calculations on array constants and named array constants work differently - they need to be treated as two separate animals.

=SUM({3,5}+1) returns 10
=SUM(My_Array+1) returns 4
=SUM({3,5})+1 returns 9
=SUM(My_Array) +1 returns 9

Evaluating the formulas 1 and 2 with F9 shows exactly the same thing, but the results are different? It seems that this is because only the first element of the named array is passed back for SUM to aggregate?

The 2nd formula result shows that SUM() is somewhat special (see: http://tinyurl.<WBR>com/5wc4ho, for some take ups by Harlan Grove and others). If an object must be treated as an array object by SUM, this needs to be signalled to Excel explicitly.

This is further supported by what happens if we deliberately convert My_Array into an array type.

Right.

It seems there are three ways we can do this.
1. Use CTRL+SHIFT+ENTER:

=SUM(My_Array+1) returns 10 when CTRL+SHIFT+ENTERED

2. Use Sumproduct, which convertes its arguments into array types:

=SUMPRODUCT(My_Array+1) returns 10

3. Use the N() worksheet function:

=SUM(N(My_Array)+1) returns 10

Now the third one, using the N() worksheet function, is bizarre - to me, at least. Again, I can't find it documented anywhere that the N() worksheet function can do this? Perhaps it has a use after all!?

Actually, there is a single method involving functions that admit array objects: That is Control+Shift+Enter.

A few functions like SumProduct, Frequency, and Lookup admit range objects as well as array objects.

N together with + in the 3rd formula yields an array object (first round of evaluation)...

=N(My_Array)+1

==>

{4,6}

which SUM can grind (second round evaluation).

So, getting back to array constants and the memory. Looking at these results, it would make sense to me that a named array constant has to pass through a particular layer of interpretation that an array constant doesn't have to; or, at least to say that they are dimensioned somewhat differently in the memory?

I tend to agree with the former line of thought rather than the "memory" line...

The three ways we can induce Excel to calculate on array constants and named array constants in the same way suggests to me that when we CTRL+SHIFT+ENTER a formula, we are just converting the relevant parts, such as range reference values, of that formula into array types. So, in turn, an array formula should simply be defined as a formula that contains an array type. This means that array formulas are not just CSE formulas, and any formula that uses Sumproduct, an array constant, or another array type construct, is in fact an array formula. A formula that contains a named array constant is not an array formula unless there is an array type conversion in there. Again, looking around the internet, there are plenty of array formula tutorials that describe array formulas and what they do, but none of them seem to specifically define what an array formula is...

I also characterize the formulas requiring CSE and the formulas with SumProduct (always), Frequency, and Lookup as range and array grinding formulas.

Colin Legg · Aug 31, 2008

Hi xld,

xld said:
I think it is not that we have with a named array constant, we have to tell Excel to process it as an array. When it is an in-line array constant, such as SUM({3,5}) Excel can work that out itself. When it is a named array constant, because the name is evaluated on use, Excel doesn't know that SUM(My_Array) is an array, so it has to be informed by array entering the formula.

Okay, thanks for that input. I'll have a careful think about it!

Hi Aladin

Aladin Akyurek said:
Colin,

Up to now I hadn't looked in this thread...
The points you rise and observations you make are interesting and valid....
....I also characterize the formulas requiring CSE and the formulas with SumProduct (always), Frequency, and Lookup as range and array grinding formulas.

Well, I'm glad you stopped by and that you found it interesting!

And I'm pleased that my definition of an array formula is not altogether different from yours.

I'll have a careful read through your post and the link: I like to research the answers I'm given. Once I've let the information sink in and I've played around with it I'll post back with a well considered reply (this may take a little while depending on how much playing around I have to do).

Cheers,
Colin

Colin Legg · Sep 11, 2008

Okay, I've done a bit of work on this.

Aladin - I agree that N() is able to de-reference name objects; hence, in our case, it effectively returns a literal array onto the stack.

In terms of different results between named array constants and literal arrays, I was not able to determine at the deepest level exactly how this is done. Whilst some technical documentation exists for how array constants in formulas are parsed, there's only vague information on how names are stored in the names table and there is even less information about the various contexts that will cause it to be de-referenced when it is pushed onto the stack. Still, I'm happy that I have a general understanding on how the tokens are working here, so this exercise has been productive.

As a passing point of interest, N() can also have a similar effect (although I think the inner working is slightly different) for VBA UDFs that return arrays. For example, in this thread here it was agreed that the Sumproduct version in post #4 had to be CSE entered to get the correct result (or to use a CSE formula with the SUM worksheet function), because of TRANSPOSE:

CSE:
=SUMPRODUCT(TRANSPOSE(IsNotFormula(G1:I1))*(G2:I2))
or
CSE:
=SUM(TRANSPOSE(IsNotFormula(G1:I1))*(G2:I2))

Well, we could work around it using N(); this does not need to be CSE entered:

=SUMPRODUCT(TRANSPOSE(N(IsNotFormula(G1:I1)))*(G2:I2))

So I think that it has some practical applications.

Pity it can't do it for Range references!

Thanks again for your help.
Colin

The Array Type

Colin Legg

MrExcel MVP

tusharm

MrExcel MVP

NateO

Legend

NateO

Legend

Colin Legg

MrExcel MVP

xld

Banned

Colin Legg

MrExcel MVP

xld

Banned

Aladin Akyurek

MrExcel MVP

Colin Legg

MrExcel MVP

Colin Legg

MrExcel MVP

Similar threads

Share this page

The Array Type

MrExcel MVP

MrExcel MVP

Legend

Legend

MrExcel MVP

Banned

MrExcel MVP

Banned

MrExcel MVP

MrExcel MVP

MrExcel MVP

Similar threads

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock