Fast array formula for running total of a column for consecutive identical values in another column

zadig_48

New Member
Joined
Jul 12, 2022
Messages
5
Office Version
  1. 365
Platform
  1. Windows
I have a column (column E) with integer values, and another column (F) with arbitrary values. I want to compute, in column G, the running total of F, but limited to the segment of consecutive rows where the current value of F is constant. The description may be easier to understand with formulas:

Excel Formula:
B2 =100                          # Length of sequence
B3 =3                            # Possible values in column E
D2 =SEQUENCE(B2)                 # Auxiliary variable
E2 =RANDARRAY(B2,,1,B3,TRUE)     # Sequence of integers between 1 and B3
F2 =RANDARRAY(B2)                # Sequence of values to be summed

G2 =IF(OR(D2=1,E2<>E1),0,G1)+F2  # Desired output, non-array formula version.
                                 # If the current value in E is the same
                                 # as in the previous row, sum to the running total of F,
                                 # else start again with the current value of F.
                                 # Formula needs to be copied along the other columns.

I don't know in advance how long the sequence will be and I would like to use an array formula to automatically adjust the length of the output, instead of repeating the formula in G2 in the G column.

The following formula works:

Excel Formula:
H2 =LET(s,E2#,v,F2#,SCAN(0,D2#,LAMBDA(a,x,IF(OR(x=1,INDEX(s,x)<>INDEX(s,x-1)),0,a)+INDEX(v,x))))

but is much slower than the G2 formula above: it becomes slow when the sequence has a length in the thousands, while the G2 formula can easily handle hundreds of thousands of values.

How can I make the array formula as fast as the non-array formula?
(and BTW, why is the array formula so slow? All operations inside it should be very fast.)

PS I posted this previously on SU (Fast array formula for running total of a column for consecutive dentical values in another column) but I didn't receive any useful answer there.
 

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
Welcome to the MrExcel Message Board!

Cross-posting (posting the same question in more than one forum) is not against our rules, but the method of doing so is covered by #13 of the Forum Rules.

Be sure to follow & read the link at the end of the rule too!

Cross posted at: Fast array formula for running sum of a column for consecutive identical values in another
If you have posted the question at more places (other than SU), please provide links to those as well.

If you do cross-post in the future and also provide links, then there shouldn’t be a problem.
 
Upvote 0
Hi Fluff

Sorry, I was not aware of the cross-posting rule, it won't happen again. I posted some time ago on SU, and today here and on ExcelForum (and nowhere else).

Thanks,
Zadig
 
Upvote 0
As Scott Craner said on SU it is not always possible to get spilled array formulae to be as fast as old fashioned drag down formulae.
 
Upvote 0
As Scott Craner said on SU it is not always possible to get spilled array formulae to be as fast as old fashioned drag down formulae.
Yes, but I was hoping that this is one case where it's possible ;)

More interesting would be understanding why it is not possible, i.e. the second part of my question
(and BTW, why is the array formula so slow? All operations inside it should be very fast.)
My H2 formula only contains:
  1. a SCAN instruction
  2. a few INDEX instruction
  3. an IF instruction
  4. very simple arithmetic
Of these, only 1., 2. and 3. are suspect of being responsible of slowing down the calculation so dramatically, but:
  1. SCAN with a simple cumulative sum is extremely fast, therefore SCAN itself is not responsible
  2. INDEX is usually extremely fast, in fact almost as fast as a direct reference. And if I replace G2 and cells below with
    Excel Formula:
    G2 =IF(OR($D2=1,INDEX($E$2#,$D2)<>INDEX($E$2#,$D2-1)),0,H1)+INDEX($F$2#,$D2)
    then the sheet remains as fast as with the simpler non-array formula.
  3. If I replace H2 with the simpler
    Excel Formula:
    H2 =SCAN(0,D2#,LAMBDA(a,x,a+INDEX(F2#,x)))
    (an unconditional cumulative sum), the sheet is almost as slow as with my proposed H2 above.
Therefore the problem is somehow the interaction between SCAN and INDEX, but why are they interacting in such a way?
 
Upvote 0
Therefore the problem is somehow the interaction between SCAN and INDEX, but why are they interacting in such a way?

As I do not know the inner working of the functions (ie how they do their thing) I cannot answer that question.
 
Upvote 0
After many efforts I finally found an array formula that is almost as fast (i.e. about 3 times slower) as the non-array version:

Excel Formula:
H2 =LET(
        sect, E2#,
        values, F2#,
        n, ROWS(sect),
        s, SEQUENCE(n),
        sentinel_val, -1,
        same_as_next, FILTER(sect, s > 1) = FILTER(sect, s < n),
        adj_values, 
            CHOOSE(
                1 + (s > 1), 
                sentinel_val, 
                INDEX(
                    CHOOSE(same_as_next + 1, sentinel_val, values), 
                    s - 1)
                ),
        prev_sum, 
            SCAN(
                0, 
                adj_values, 
                LAMBDA(a,x, IF(x < 0, 0, a + x))
            ),
        prev_sum + values
   )
 
Upvote 0
Solution

Forum statistics

Threads
1,223,908
Messages
6,175,306
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top