Array Formulas in Large Worksheet

AndyPaff

New Member
Joined
Jun 2, 2009
Messages
7
How might I get around the problem of formulas in a large worksheet taking ages to calculate or causing Excel to hang?


I have a large worksheet (300,000 lines x 50 columns) of horse racing data. The data are arranged one line per horse per race with column A holding the race identifiers. I need to normalise the data within each race on a scale from 0 to 1. For small amounts of data I use the formula x-min(r)/max(r)-min(r) in an array formula (where x= the item of data and r =the same data for the whole race).


My problem is, for such a large amount of data, the worksheet just hangs. For a smaller subset (5,000 rows) I switched off auto calc, copied the formulas as normal formulas and then converted them to array formulas. Even this took several hours. Once calculated the data can be saved as csv, the formulas don't need to be recalculated. Should I be looking to do this programatically rather than copying down the rows?
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
It’s an interesting case. I'm not sure I can help but can you post some sample data & also show the formula?
 
Upvote 0
Hi AndyPaff

I'm with Akuini in wanting to see some sample data and formulae.

However, without that, although 300,000 rows x 50 columns is not a large dataset relatively speaking (less than 0.1% of the total cells on each worksheet - each worksheet has 2 ^ 14 columns (16,384) and 2 ^ 20 rows (1,048,576) or 2 ^ 34 cells (17,179,869,184)), that you are occupying all 15m of these cells with formulae, regardless of whether they are volatile or non-volatile (PS array worksheet formulae are great, but they do impose a large processing penalty), the speed will probably not get much faster by simply re-jigging the formulae.

You could try upgrading to Excel 64bit, but this may not be a total solution either.

You are on the right track with thinking about doing it with VBA. The fastest methods involve arrays, collections and the dictionary object. As long as you are not looping through your 15m cells continuously, my experience suggests the running time for any VBA based calculation would be into the seconds, rather than minutes, and certainly not hours.

The key to all of this, though, is to minimise the number of times you refer to the worksheet. The general philosophy is to read once from the worksheet, perform the calculations in memory, and then write the results once back to the worksheet.

Cheers

pvr928
 
Last edited:
Upvote 0
You have shown the basis of your formula, but not an example of your actual formula?

Also, things to consider would be using helper columns to try and eliminate the array formulas, and for formulas based on older data that wont change now, convert those formulas to values (copy/paste values)
 
Upvote 0
Thanks for your replies everyone. Here's an example of the formula. For simplicity it's just referring to 100 rows.


{=IFERROR((D2-MIN(IF($A$2:$A$100=$A2,D$2:D$100)))/(MAX(IF($A$2:$A$100=$A2,D$2:D$100))-MIN(IF($A$2:$A$100=$A2,D$2:D$100))),"")}


The race identifiers are in column A
The data this cell refers to is in column D
Some races have no data in some columns so I need to trap the errors and leave the cell blank.


In light of what you have said, I can already see where I'm going wrong. I'm doing everything in one hit and calculating the same max and min figures over and over again. I would need 100 helper columns (max and min x 50) but if I could populate these first, all the information would be within the row and I could use normal formulas. However, I bet there's an easier way that I just can't see right now.


Typical output looks like this,

[TABLE="width: 448"]
<colgroup><col width="64" style="width:48pt" span="7"> </colgroup><tbody>[TR]
[TD="width: 64, bgcolor: transparent"]RaceID[/TD]
[TD="width: 64, bgcolor: transparent"]aDS[/TD]
[TD="width: 64, bgcolor: transparent"]bDS[/TD]
[TD="width: 64, bgcolor: transparent"]mDS[/TD]
[TD="width: 64, bgcolor: transparent"]aDS-Norm[/TD]
[TD="width: 64, bgcolor: transparent"]bDS-Norm[/TD]
[TD="width: 64, bgcolor: transparent"]mDS-Norm[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]61[/TD]
[TD="bgcolor: transparent, align: right"]96[/TD]
[TD="bgcolor: transparent, align: right"]58[/TD]
[TD="bgcolor: transparent, align: right"]0.91[/TD]
[TD="bgcolor: transparent, align: right"]0.93[/TD]
[TD="bgcolor: transparent, align: right"]0.88[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]67[/TD]
[TD="bgcolor: transparent, align: right"]103[/TD]
[TD="bgcolor: transparent, align: right"]66[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]40[/TD]
[TD="bgcolor: transparent, align: right"]53[/TD]
[TD="bgcolor: transparent, align: right"]40[/TD]
[TD="bgcolor: transparent, align: right"]0.60[/TD]
[TD="bgcolor: transparent, align: right"]0.51[/TD]
[TD="bgcolor: transparent, align: right"]0.61[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]46[/TD]
[TD="bgcolor: transparent, align: right"]46[/TD]
[TD="bgcolor: transparent, align: right"]46[/TD]
[TD="bgcolor: transparent, align: right"]0.69[/TD]
[TD="bgcolor: transparent, align: right"]0.45[/TD]
[TD="bgcolor: transparent, align: right"]0.70[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]54[/TD]
[TD="bgcolor: transparent, align: right"]81[/TD]
[TD="bgcolor: transparent, align: right"]59[/TD]
[TD="bgcolor: transparent, align: right"]0.81[/TD]
[TD="bgcolor: transparent, align: right"]0.79[/TD]
[TD="bgcolor: transparent, align: right"]0.89[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]51[/TD]
[TD="bgcolor: transparent, align: right"]51[/TD]
[TD="bgcolor: transparent, align: right"]51[/TD]
[TD="bgcolor: transparent, align: right"]0.76[/TD]
[TD="bgcolor: transparent, align: right"]0.50[/TD]
[TD="bgcolor: transparent, align: right"]0.77[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201015[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[TD="bgcolor: transparent, align: right"]0.00[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]71[/TD]
[TD="bgcolor: transparent, align: right"]79[/TD]
[TD="bgcolor: transparent, align: right"]75[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]0.98[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]60[/TD]
[TD="bgcolor: transparent, align: right"]70[/TD]
[TD="bgcolor: transparent, align: right"]60[/TD]
[TD="bgcolor: transparent, align: right"]0.85[/TD]
[TD="bgcolor: transparent, align: right"]0.86[/TD]
[TD="bgcolor: transparent, align: right"]0.80[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]71[/TD]
[TD="bgcolor: transparent, align: right"]80[/TD]
[TD="bgcolor: transparent, align: right"]68[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]0.99[/TD]
[TD="bgcolor: transparent, align: right"]0.91[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]71[/TD]
[TD="bgcolor: transparent, align: right"]81[/TD]
[TD="bgcolor: transparent, align: right"]71[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]1.00[/TD]
[TD="bgcolor: transparent, align: right"]0.95[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]54[/TD]
[TD="bgcolor: transparent, align: right"]54[/TD]
[TD="bgcolor: transparent, align: right"]54[/TD]
[TD="bgcolor: transparent, align: right"]0.76[/TD]
[TD="bgcolor: transparent, align: right"]0.67[/TD]
[TD="bgcolor: transparent, align: right"]0.72[/TD]
[/TR]
[TR]
[TD="bgcolor: transparent, align: right"]201019[/TD]
[TD="bgcolor: transparent, align: right"]57[/TD]
[TD="bgcolor: transparent, align: right"]67[/TD]
[TD="bgcolor: transparent, align: right"]55[/TD]
[TD="bgcolor: transparent, align: right"]0.80[/TD]
[TD="bgcolor: transparent, align: right"]0.83[/TD]
[TD="bgcolor: transparent, align: right"]0.73[/TD]
[/TR]
</tbody>[/TABLE]
<strike></strike>
 
Upvote 0
Not quite sure why you feel you would need that many helper columns? I see 1 for MIN and 1 for MAX? Something like...
IF(A2=ID,D2,"")

I can see if you have a lot of different ID's then yes, you would need extra columns, but those formulas could then be values once you have your answer.
Another thing to consider is using a would be to use a Pivot Table for this?
 
Upvote 0
Thanks Ford.

I wasn't very clear but to normalise each of the 50 columns of data within each race I would need to calculate the max and min for each of those 50 columns of data within each race. There are tens of thousands of races.

I had considered a pivot table but again, this would be quite large. I think the best solution would be nested loops in VBA (or indeed any language that will process a csv). These would loop through each column within each race calculating the max and min and the normalised figures. My problem is the last time I did anything like that was in Quick Basic! Time to brush up my skills I think.
 
Upvote 0
Sorry to re-surface this one but I haven't been able to crack it and it's driving me mad. Can anyone help?
 
Upvote 0
@AndyPaff

Do you have MINIFS and MAXIFS functions on your system?

If you do, try:

=IFERROR((D2-MINIFS(D$2:D$100,$A$2:$A$100,$A2))/(MAXIFS(D$2:D$100,$A$2:$A$100,$A2)-MINIFS(D$2:D$100,$A$2:$A$100,$A2)),"")

which needs just enter.

Otherwise, control+shift+enter and copy down:

=IFERROR((D2-V(MIN(IF($A$2:$A$100=$A2,D$2:D$100))))/(MAX(IF($A$2:$A$100=$A2,D$2:D$100))-V()),"")

where V() is a udf. For the foregoing formula to work, we need to install the V() function using Alt+F11 of which the code is as follows:

Public Function V(Optional vrnt As Variant) As Variant
'
' Stephen Dunn
' 2002-09-12
'
Static vrntV As Variant
If Not IsMissing(vrnt) Then vrntV = vrnt
V = vrntV
End Function
 
Upvote 0
Thanks Aladin. I don't have MINIFS and MAXIFS but your array formula and udf are certainly tidier.


Unfortunately, the calculation still hangs due I think, to the huge ranges in the array formula (sheet is 300,000 x 56) . I feel this could be solved by using the evaluate function in VBA. If I could make all the cell references dynamic with the ranges limited to say 100 rows, I could loop through the cells evaluating them one at a time. The array could then be written to the worksheet when the looping has finished. So, the first iteration would be,


var = Evaluate("=IFERROR((D2-V(MIN(IF($A$2:$A$100=$A2,D$2:D$100))))/(MAX(IF($A$2:$A$100=$A2,D$2:D$100))-V()),"")")
Array (row,col)=var


Followed by
var = Evaluate("=IFERROR((D3-V(MIN(IF($A$3:$A$101=$A3,D$3:D$101))))/(MAX(IF($A$3:$A$101=$A3,D$3:D$101))-V()),"")")
Array (row,col)=var


And so on.
This should speed up the calculation as only one cell in a range of 100 is being calculated


My problem is, I don't know how to make the cell/range references in the formula dynamic. I've tried using R1C1 notation in " & Cells() & " and " & Range() & " but it doesn't seem to work.
Does the idea make any sense, I'm a complete VBA Novice?
 
Upvote 0

Forum statistics

Threads
1,225,754
Messages
6,186,826
Members
453,377
Latest member
JoyousOne

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top