# Speedier Alternatives to SUMIFS



## Tarver (Jan 3, 2023)

I have a fairly large dataset (350,000 rows) for which I need to sum data for 12,000 customers.  Using SUMIFS is a long process and can take up to 10 minutes, because the same calculations are being repeated over and over again.

Is there a faster way?

Attached is a look at a sample of my data and a small table of the output I'm attempting to create.   Your ideas on a faster way to calculate this result are welcomed!  Thanks!

Predictive AR Model v1.xlsxABCDEFG1Account noPosting dateDocument TypeAmount210000967/1/2021SI$955.00Start Date7/5/2021310000847/2/2021SI$960.00End Date7/16/2021410000847/2/2021SI$1,395.00510000847/2/2021SI$793.95CustomerAmount610000847/2/2021SI$1,700.001000052$0.00710000847/2/2021SI$375.001000084$20,428.00810000847/2/2021SI$360.001000096$0.00910000847/2/2021SI$1,235.001010000847/2/2021SI$960.001110000847/9/2021SI$740.351210000847/13/2021SI$618.001310000847/13/2021SI$1,728.001410000847/13/2021SI$713.001510000847/16/2021SI$424.001610000847/16/2021SI$470.001710000847/16/2021SI$2,610.001810000847/16/2021SI$572.001910000847/16/2021SI$1,055.002010000847/16/2021SI$25.002110000847/16/2021SI$3,745.002210000847/9/2021SI$465.652310000847/16/2021SI$6,560.002410000847/16/2021SI$702.002510000527/19/2021SI$822.002610000847/23/2021SI$3,190.002710000847/23/2021SI$835.002810000847/23/2021SI$1,070.002910000847/23/2021SI$4,480.003010000527/23/2021SI$358.2531Sheet1Cell FormulasRangeFormulaF6:F8F6=SORT(UNIQUE(Table4[Account no]))G6:G8G6=SUMIFS(Table4[Amount],Table4[Account no],F6#,Table4[Posting date],">="&$G$2,Table4[Posting date],"<="&$G$3)Dynamic array formulas.


----------



## Joe4 (Jan 3, 2023)

Trying to aggregate that amount of data, you are really pushing the limits of Excel.  What you really have is a relational database, and a relational database program like SQL, Microsoft Access, or Oracle handles this type of things much more efficiently.

If you have to keep it in Excel, you may more luck using Power Query within Excel to do this.
There is a whole forum here devoted to "Power Tools" products, and some to the stickies give some good information on Power BI.


----------



## kevin9999 (Jan 3, 2023)

Try this on a *copy *of your data.  It assumes that your sheet is laid out exactly as your sample shows.  Change sheet name/cell locations if they don't match your actual data exactly.  On a test of your data replicated to 360k rows, it completed in under 2 seconds for me.


```
Option Explicit
Sub Faster_Than_Sumifs()
    Dim t As Double: t = Timer
    Dim ws As Worksheet
    Set ws = Worksheets("Sheet1")   '<< Change to actual sheet name
    
    Dim Start As Double, Finish As Double
    Start = CDbl(ws.Range("G2").Value2)
    Finish = CDbl(ws.Range("G3").Value2)
    
    Dim ar, i As Long, n As Long
    ar = Sheet1.Range("A2", Cells(Rows.Count, "D").End(xlUp))
    
    With CreateObject("Scripting.Dictionary")
        For i = 1 To UBound(ar, 1)
            If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)
            Else
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + 0
            End If
        Next i
        ar = Array(.keys, .Items)
        n = .Count
    End With
    
    With ws.Range("F5").CurrentRegion
        .Offset(1).ClearContents
        .Offset(1).Resize(n, 2).Value2 = Application.Transpose(ar)
        .Sort Key1:=ws.Range("F5"), order1:=xlAscending, Header:=xlYes
    End With
    
    MsgBox "Completed in " & Timer - t & " seconds"
End Sub
```


----------



## kevin9999 (Jan 3, 2023)

Edit - change this line:

```
.Sort Key1:=ws.Range("F*5*"), order1:=xlAscending, Header:=xlYes
```


to this

```
.Sort Key1:=ws.Range("F*6*"), order1:=xlAscending, Header:=xlYes
```


----------



## Tarver (Jan 4, 2023)

Kevin, this is great, and I really appreciate you taking the time to do this.  

I've made adjustments to your code to update for my actual sheet names and cells. 

In addtion, I've updated line


> ar = Sheet1.Range("A2", Cells(Rows.Count, "D").End(xlUp))


to change Sheet1.Range to be ws.Range.  That seemed to work.

However, I'm getting a runtime error 13, type mismatch in this line:


> If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then


I've spent the better part of the morning trying to understand what the code is doing and what the error is, but I've thus far been unable to work it out.

If you (or anyone else) can help, I'd appreciate the nudge in the right direction.

Thanks again!

BT


----------



## kevin9999 (Jan 4, 2023)

Hi BT, thank you for the feedback & comprehensive explanation of the issue. Moving forward, a few things would help to get to the bottom of the issue. Firstly, I’ll explain what this line of code does. _(It’s easier to explain if you think of things in terms of a range)._

If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then

The data in your table is loaded into an array (*ar*) which is then looped through one _row _at a time – where *i* is the row number, and the number following the comma after the* i* is the _column _number. The value is confirmed as a Double data type via the CDbl(). Therefore, that line of code asks – is the value in the second column greater than or equal to the value stored in the variable Start *and* is it less than or equal to the value stored in the variable Finish, and if so *Then* this line:

.Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)

stores that value as a Dictionary key – and the value in column 4 of the same row ar(i, *4*) as the Dictionary _item _for that key – _unless_ that key already exists, in which case it increments the value to the value already stored in the item for that key (the equivalent of summing the value – which is what ultimately returns the Sumifs() total once the entire data has been processed.)

What would help would be if you could tell me what values you are getting for the variables when the code errors. So when the code stops, put the mouse cursor over the following variables on the highlighted line and tell me what you get for *Start* & *Finish* (in both cases it should be a number, something like 44382 or similar); over *ar* (should appear as a date, something like 1/7/2021) and *i* (should be a number – if it’s 1 then the code doesn’t like anything about that line in relation to the data, if it’s greater than 1 then whatever number it is suggests there’s a problem on that particular row in your data).

It would also help if you could provide an XL2BB sample of your actual data so I can see the true layout – you can disguise any sensitive data & it doesn’t need to be a large sample, just something similar to what you provided in post #1. Also, could you show the full actual code you used once you had made the adjustments for sheet name/cell locations etc.

I have made a small edit to the code I posted in post #3 because I wasn’t happy with the final Sort of the returned table of values.

```
Option Explicit
Sub Faster_Than_Sumifs_V2()
    Dim t As Double: t = Timer
    Dim ws As Worksheet
    Set ws = Worksheets("Sheet1")   '<< Change to actual sheet name
    
    Dim Start As Double, Finish As Double
    Start = CDbl(ws.Range("G2").Value2)
    Finish = CDbl(ws.Range("G3").Value2)
    
    Dim ar, i As Long, n As Long
    ar = ws.Range("A2", Cells(Rows.Count, "D").End(xlUp))
    
    With CreateObject("Scripting.Dictionary")
        For i = 1 To UBound(ar, 1)
            If CDbl(ar(i, 2)) >= Start And CDbl(ar(i, 2)) <= Finish Then
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + ar(i, 4)
            Else
                .Item(ar(i, 1)) = .Item(ar(i, 1)) + 0
            End If
        Next i
        ar = Array(.keys, .Items)
        n = .Count
    End With
    
    With ws.Range("F5").CurrentRegion
        .Offset(1).ClearContents
        .Offset(1).Resize(n, 2).Value2 = Application.Transpose(ar)
    End With
    
    With ws.Range("F5").CurrentRegion
        .Sort Key1:=ws.Range("F6"), order1:=xlAscending, Header:=xlYes
    End With
    
    MsgBox "Completed in " & Timer - t & " seconds"
End Sub
```

When I ran the code, after extrapolating the data you provided down to 360K rows (bearing in mind only 1000084 had any data that falls within the date range) it took just over 1 second to turn this:
sumifs.xlsmABCDEFG1Account noPosting dateDocument TypeAmount210000967/01/2021SI$955.00Start Date7/05/2021310000847/02/2021SI$960.00End Date7/12/2021410000847/02/2021SI$1,395.00510000847/02/2021SI$793.95CustomerAmount610000847/02/2021SI$1,700.00710000847/02/2021SI$375.00810000847/02/2021SI$360.00910000847/02/2021SI$1,235.001010000847/02/2021SI$960.001110000847/09/2021SI$740.351210000847/13/2021SI$618.001310000847/13/2021SI$1,728.001410000847/13/2021SI$713.001510000847/16/2021SI$424.001610000847/16/2021SI$470.001710000847/16/2021SI$2,610.001810000847/16/2021SI$572.001910000847/16/2021SI$1,055.002010000847/16/2021SI$25.00Sheet1

Into this:
sumifs.xlsmABCDEFG1Account noPosting dateDocument TypeAmount210000967/01/2021SI$955.00Start Date7/05/2021310000847/02/2021SI$960.00End Date7/12/2021410000847/02/2021SI$1,395.00510000847/02/2021SI$793.95CustomerAmount610000847/02/2021SI$1,700.001000052$0.00710000847/02/2021SI$375.001000084$14,472,000.00810000847/02/2021SI$360.001000096$0.00910000847/02/2021SI$1,235.001010000847/02/2021SI$960.001110000847/09/2021SI$740.351210000847/13/2021SI$618.001310000847/13/2021SI$1,728.001410000847/13/2021SI$713.001510000847/16/2021SI$424.001610000847/16/2021SI$470.001710000847/16/2021SI$2,610.001810000847/16/2021SI$572.001910000847/16/2021SI$1,055.002010000847/16/2021SI$25.00Sheet1


----------



## Tarver (Jan 5, 2023)

OUTSTANDING!

I cannot thank you enough for your help.  The simple explanation you provided made sense to me, and I was able to edit the code to match my actual data set.  I had it up and running in about 20 minutes, and I've verified that the answers are correct vs. a slow-moving SUMIFS.  

The key for me was understanding the offsets of the columns.  Your tip on mousing over the variable in the code helped me determine what the values were, and that helped me get to the right column.

Excellent work on your part, and I'm very grateful!

BT


----------



## kevin9999 (Jan 5, 2023)

Happy to help BT and thanks for the feedback 👍 
Out of curiosity, I'd be keen to know how long the code took to run with your actual data?


----------



## Tarver (Jan 5, 2023)

kevin9999 said:


> Out of curiosity, I'd be keen to know how long the code took to run with your actual data?



Depending upon the selected date range, it has taken 0.4 to 0.6 seconds.  Very quick!


----------



## Tarver (Jan 5, 2023)

kevin9999 said:


> Happy to help BT and thanks for the feedback 👍


One more point of follow-up for you: I've now updated my sheet to run your macro code when the desired date range is updated and I'm using it twice to get an average historical sales trend and then again to get a recent sales trend to compare the two.  Ultimately the project is identify customers who may have financial issues, who have either slowed the rate at which they pay invoices or have suddenly spiked in new orders.  Your contribution was a big help!


----------



## Tarver (Jan 3, 2023)

I have a fairly large dataset (350,000 rows) for which I need to sum data for 12,000 customers.  Using SUMIFS is a long process and can take up to 10 minutes, because the same calculations are being repeated over and over again.

Is there a faster way?

Attached is a look at a sample of my data and a small table of the output I'm attempting to create.   Your ideas on a faster way to calculate this result are welcomed!  Thanks!

Predictive AR Model v1.xlsxABCDEFG1Account noPosting dateDocument TypeAmount210000967/1/2021SI$955.00Start Date7/5/2021310000847/2/2021SI$960.00End Date7/16/2021410000847/2/2021SI$1,395.00510000847/2/2021SI$793.95CustomerAmount610000847/2/2021SI$1,700.001000052$0.00710000847/2/2021SI$375.001000084$20,428.00810000847/2/2021SI$360.001000096$0.00910000847/2/2021SI$1,235.001010000847/2/2021SI$960.001110000847/9/2021SI$740.351210000847/13/2021SI$618.001310000847/13/2021SI$1,728.001410000847/13/2021SI$713.001510000847/16/2021SI$424.001610000847/16/2021SI$470.001710000847/16/2021SI$2,610.001810000847/16/2021SI$572.001910000847/16/2021SI$1,055.002010000847/16/2021SI$25.002110000847/16/2021SI$3,745.002210000847/9/2021SI$465.652310000847/16/2021SI$6,560.002410000847/16/2021SI$702.002510000527/19/2021SI$822.002610000847/23/2021SI$3,190.002710000847/23/2021SI$835.002810000847/23/2021SI$1,070.002910000847/23/2021SI$4,480.003010000527/23/2021SI$358.2531Sheet1Cell FormulasRangeFormulaF6:F8F6=SORT(UNIQUE(Table4[Account no]))G6:G8G6=SUMIFS(Table4[Amount],Table4[Account no],F6#,Table4[Posting date],">="&$G$2,Table4[Posting date],"<="&$G$3)Dynamic array formulas.


----------



## kevin9999 (Jan 5, 2023)

Tarver said:


> One more point of follow-up for you: I've now updated my sheet to run your macro code when the desired date range is updated and I'm using it twice to get an average historical sales trend and then again to get a recent sales trend to compare the two.  Ultimately the project is identify customers who may have financial issues, who have either slowed the rate at which they pay invoices or have suddenly spiked in new orders.  Your contribution was a big help!


Thanks for letting me know 🙂


----------



## Tarver (Jan 6, 2023)

What's weird here is that when I call this code from another piece of code that looks for changes, it always hangs up on this line with a "Method 'Range" of object '_Worksheet' failed:


----------

