Huge Dataset - Looking for a SUMIFS alternative

DungeonMaster

New Member
Joined
Oct 21, 2015
Messages
4
Hello everyone, this is my first post here.

I am using Excel 2013 on Windows 8.1.

I have posted my issues on Reddit before, without results unfortunately.

I am currently manipulating a very large dataset (500k rows approximatly) that I need to reorder before analyzing the data.

This picture shows what I want to do with my data :

Columns A, B and C contain indicators (US county number, Year and age group). What I want to do is to sum, for each county, year and age group, values present in columns D to I To new columns (N and onwards in the picture I attached). In other words, I want to sum Values in Column D if their indicators in A, B and C are identical, and then transfer the result to another column.

I already had to do the same task last year (with a way smaller datafile), and found that the SUMIFS function worked perfectly for this kind of task (by creating some "helping columns", in K and L on the image I provided, and then using SUMIFS).

The issue is that using SUMIFS to fill the entire table is a daunting task for my computer (even with a 8-core processor). It takes ages to simply fill one column when it doesn't simply crash.

I was then wondering if anyone could guide me with a workaround that would be less calculation-heavy. I absolutely don't know how to code in VBA (except for copying/pasting macros), but I figure that the macro wouldn't be that hard to write. What I need would be :

- Sum D2 to D4
- Paste result in N2
- Sum D21 to D23
- Paste result in N3
etc

Thank you very much for your help!
 

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Hi. Welcome to the MrExcel forum!

It can be done with a few helper columns.
I highlighted the principal formula in bold.
This solution works because - and only if - your data is sorted on columns A, B and C.
If you would delete any rows, you need to copy formulas from row 2 down again as deletion will result in REF#! errors.

Formulas below are for row 2 and to be copied down.

For all data rows:
J: combined keyfields: =A2&B2&D2
K: row number on which the key combination starts: =IF($J2<>$J1,ROW(),K1)
L: row number on which the key combination ends: =IF($J2<>$J3,ROW(),L3)
Note: the values in column L first make sense once copied down to the last data row.

For summary data rows:
N: start row for key combination: =MAX(2,SOM(O1,1))
Note: If you put a 1 in O1, you can simplify the formula as =SOM(O1,1)
O: end row for key combination: =INDEX(L:L,N2)
P,Q,R: key A,B,C (your columns K,L,M): =INDEX(A:A,$N2) copy to the right
S: sum of D-values (your column O): =SUM(INDEX(D:D,$N2):INDEX(D:D,$O2)) you can copy to the right for sums of E-values, F-values etcetera

When you copy the formulas for the summary data down, you reach the end once the value in column O is 0.
If your formula in N2 is =MAX(2,SOM(O1,1)), the summary restarts from the beginning again if you copy formulas further down.
 
Last edited:
Upvote 0
I couldn't make out much from your picture so had a guess from your text.
Try this VBA code on some test data to see if it's something like what you want.
No need to sort the data first.
Code:
Sub testing()

Dim Dic As Object, a, c, u(), x
Dim i As Long, j As Long, rws As Long
Set Dic = CreateObject("scripting.dictionary")
Set a = Intersect(ActiveSheet.UsedRange, Range("A:I"))
rws = a.Rows.Count
ReDim u(1 To rws, 1 To 9)

For Each c In a.Resize(rws, 3).Columns
    x = x & "&char(2)&" & c.Address
Next c
a.Value = a.Value

For Each c In Evaluate(Mid(x, 2))
    i = i + 1
    If Not Dic.exists(c) Then
        Dic(c) = Dic.Count + 1
        u(Dic(c), 1) = c
        For j = 1 To 3
            u(Dic(c), j) = Split(c, Chr(2))(j)
        Next j
    End If
    
    For j = 4 To 9
        u(Dic(c), j) = u(Dic(c), j) + a(i, j)
    Next j
Next c

[n1].Resize(Dic.Count, 9) = u

End Sub
 
Upvote 0
First of all, thank you very much to both of you for your answers!

Surprisingly, both your procedures yield the exact same result at the end, but seeing how the methods differ is very interesting.

Both of these worked (and divided the size of my doc by 3, which is always nice). I will try to run the SUMIFS function again (since I now have to distribute the values according to the parameters 1 to 7).

The macro really impressed me, I did not alter a single thing and it worked like a charm. I also used MarcelBeug's solution to see how this one worked. I find it very elegant.

Some minor errors were in the formulas before I could get it to work (if someone wants to use this code in the future):

- J: combined keyfields: =A2&B2&D2 is actually A2&B2&C2
- the SOM function is actually SUM (Excel en français peut-être ?)

Thank you very much to both of you.
 
Upvote 0
A quick update :

I am running my SUMIFS function right now. It has gone to maybe 15 minutes by column to 1 minute.

Thank you again!
 
Upvote 0
Unless I'm missing something, a pivot table would be a very simple answer to this.
 
Upvote 0
The 500k rows mentioned in opening post was quite an interesting challenge.
So did some test data (500k rows) with the TestData code below to see if the VBA code in post#3 could complete the task in a reasonable time.

Took me about 17 seconds runtime which did seem somewhat long.

But I made a bit of a boo-boo about halfway down the post#3 code
a.value=a.value should have just been a=a which cut the run time to about 5.5 seconds.

I'd be quite curious to know how thiscompares with a formula or pivot table approach for a dataset of this size.
Code:
Sub TestData() 'this takes about 12 seconds to run

ActiveSheet.UsedRange.ClearContents
With Range("A1").Resize(500000, 9)
    .Resize(, 3) = "=rept(char(randbetween(65,74)),3)"
    .Resize(, 6).Offset(, 3) = "=randbetween(0,20)"
    .Resize(1) = "=""HDR_"" & column()"
    .Value = .Value
    .Columns.AutoFit
End With

End Sub
 
Upvote 0
If you have the test data already, why not try it? ;)
 
Upvote 0
If you have the test data already, why not try it? ;)
@Rory,
I rarely if ever use either pivot tables or formulas on a dataset that size. My experience with them has not been good.


Quote RoryA
"Unless I'm missing something, a pivot table would be a very simple answer to this.
"

the test dataset is equally available to everyone including yourself, so if it's very simple could not OP be shown how effective it is for this problem?
;)
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,876
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top