SUMPRODUCT with multiple conditions, wildcards, & dynamic sum range is painfully slow

cr731

Well-known Member
Joined
Sep 17, 2010
Messages
611
I'm seeking some advice on how I can design my spreadsheet to be faster. I'm using a sumproduct formula similar to:

Code:
=SUMPRODUCT((ISNUMBER(SEARCH(Criteria,CriteriaRange)) * (INDEX(SumRange,,X):INDEX(SumRange,,Y))

I am using the ISNUMBER(SEARCH) approach because my Criteria could have wildcards. Sometimes I need to sum based on ABCDE and sometimes just ABC**.

Making matters worse, my sum range must be dynamic in that some rows need to sum across 1-2 columns and others 5-6.

This is because my table is like this,

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Customer+Product[/TD]
[TD]Jan[/TD]
[TD]Feb[/TD]
[TD]Mar[/TD]
[TD]Apr[/TD]
[/TR]
[TR]
[TD]ABCDE-1234[/TD]
[TD]10[/TD]
[TD]20[/TD]
[TD]30[/TD]
[TD]40[/TD]
[/TR]
[TR]
[TD]ACD-2385[/TD]
[TD]60[/TD]
[TD]80[/TD]
[TD]90[/TD]
[TD]10[/TD]
[/TR]
</tbody>[/TABLE]

So each row using this sumproduct must sum up a different number of months. I.e. some lines I want Jan-Mar, others Feb-Apr, and so forth (and the calendar goes for the full year).

This is absolutely painstakingly slow - this sumproduct is used in about 500 rows and then there are 3 variations on it for each row (Volume, Revenue, Margin).

My source data, where it is looking up to, is up to 50,000 rows.

I have spent HOURS on this trying to think of some way to make it more efficient but I just can't given all of the variables.

Is this just a case of trying to use Excel for something it wasn't built to do or is there something I can do to make this faster?
 
This seems to be very interesting problem.
Where do X and Y in your formula come from?

J.Ty.
 
Upvote 0
This seems to be very interesting problem.
Where do X and Y in your formula come from?

J.Ty.

More background:

  • Every line where this SUMPRODUCT will be used represents a first point of sale; i.e. date the sales are starting
  • Each sale (i.e. customer/product combination) must be tracked for 12 months following this "first sale" date
  • X is the date the sale starts
  • Y is the lesser of the current month or the date that the 12 month tracking would end


For example: Start selling in February 2014: I must track Feb 2014 - Feb 2015. So, when preparing my Oct report, I want to sum columns Feb-Oct. In this case, X is Feb 2014 (start date) and Oct 2014 is Y (as in, the lesser of this period or the date the tracking ends, Feb 2015).

I use a helper column/formula to give me the column number that X matches to and similar with Y. Something like:

=MATCH(StartDate,HeaderRowofDates,0) to get the column number

Because of this, every line must track different columns.

Complicating things, customer names have a geography portion to them, for instance

CustomerANorthAmerica
CustomerAEurope
CustomerASouthAmerica

Some lines will require summing a particular region; i.e. CustomerANorthAmerica ... whereas other lines will require summing all regions, i.e. CustomerA* ... which is why the wildcard element is needed in my formula.

:eeek:
 
Last edited:
Upvote 0
I believe the INDEX():INDEX() syntax is volatile and can be a bottleneck in performance.

Given that your columns vary based on Dates (Jan Feb Mar, ect)
Something might be worked out using >=Jan and <=Mar

Are those real dates formatted as MMM, or just text strings?
If they are text strings, are you flexible to change them so they are dates (1st of each month) formatted as MMM ?

And how does your current formula determine which columns to use?
 
Upvote 0
Sorry, your last post answered my questions...

Try

=SUMPRODUCT((ISNUMBER(SEARCH(Criteria,CriteriaRange))*(HeaderRow>=StartDate)*(HeaderRow<=EndDate)*SumRange)

when you say you sometimes do ABC*
Is it always only the wildcard on the END, or will it ever be *ABC ?
If the wildcard is always on the end, try

=SUMPRODUCT((LEFT(CriteriaRange,LEN(Criteria))=Criteria)*(HeaderRow>=StartDate)*(HeaderRow<=EndDate)*SumRange)
 
Upvote 0
Sorry, your last post answered my questions...

Try

=SUMPRODUCT((ISNUMBER(SEARCH(Criteria,CriteriaRange))*(HeaderRow>=StartDate)*(HeaderRow<=EndDate)*SumRange)

when you say you sometimes do ABC*
Is it always only the wildcard on the END, or will it ever be *ABC ?
If the wildcard is always on the end, try

=SUMPRODUCT((LEFT(CriteriaRange,LEN(Criteria))=Criteria)*(HeaderRow>=StartDate)*(HeaderRow<=EndDate)*SumRange)

My dates are currently text, such as Jan 2014 - Volume, Feb 2014 - Volume and so on to Dec 2014 - Margin.

The problem with formatting as dates is these are in an Excel table, so column header must be unique, and I will have 3 dates for each (Volume, Revenue and Margin).

I could put the dates in a helper row directly above the header?

Lastly... the wildcard is always at the end, so your left(criteria,len(criteria) should work... will that save much over ISNUMBER(SEARCH)?
 
Upvote 0
You said...
I use a helper column/formula to give me the column number that X matches to and similar with Y. Something like:

=MATCH(StartDate,HeaderRowofDates,0) to get the column number
So is that StartDate and HeaderRowofDates not real dates??
If that match function you posted gets you the correct column #s, then my logic should work.


The LEFT idea may be slightly faster, but probably not much.
 
Upvote 0
Try to normalize the data like below...

[TABLE="width: 539"]
<TBODY>[TR]
[TD="class: xl65, width: 157, bgcolor: white"]Customer+Product
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Month
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Value
[/TD]
[TD="class: xl65, width: 91, bgcolor: white"]MonthNum
[/TD]
[TD="class: xl66, width: 32, bgcolor: transparent"][/TD]
[TD="class: xl65, width: 134, bgcolor: white"][/TD]
[TD="class: xl66, width: 64, bgcolor: transparent"][/TD]
[TD="class: xl66, width: 64, bgcolor: transparent"][/TD]
[TD="class: xl66, width: 49, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ABCDE-1234
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Jan
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]10
[/TD]
[TD="class: xl67, bgcolor: transparent"]1
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl65, width: 134, bgcolor: white"]ABCDE-1234
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Jan
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Mar
[/TD]
[TD="class: xl65, width: 49, bgcolor: white"]60
[/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ABCDE-1234
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Feb
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]20
[/TD]
[TD="class: xl67, bgcolor: transparent"]2
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl65, width: 134, bgcolor: white"]ACD-2385
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Feb
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Apr
[/TD]
[TD="class: xl65, width: 49, bgcolor: white"]180
[/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ABCDE-1234
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Mar
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]30
[/TD]
[TD="class: xl67, bgcolor: transparent"]3
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ABCDE-1234
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Apr
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]40
[/TD]
[TD="class: xl67, bgcolor: transparent"]4
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ACD-2385
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Jan
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]60
[/TD]
[TD="class: xl67, bgcolor: transparent"]1
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ACD-2385
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Feb
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]80
[/TD]
[TD="class: xl67, bgcolor: transparent"]2
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ACD-2385
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Mar
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]90
[/TD]
[TD="class: xl67, bgcolor: transparent"]3
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
[TR]
[TD="class: xl65, width: 157, bgcolor: white"]ACD-2385
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]Apr
[/TD]
[TD="class: xl65, width: 64, bgcolor: white"]10
[/TD]
[TD="class: xl67, bgcolor: transparent"]4
[/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[TD="class: xl66, bgcolor: transparent"][/TD]
[/TR]
</TBODY>[/TABLE]

A:C is a re-ordering of your original data.

D2, just enter and copy down:

=MONTH("1-"&B2)

The processing becomes now easier and much faster...

I2, just enter and copy down:

=SUMIFS(C:C,A:A,F2,D:D,">="&MONTH("1-"&G2),D:D,"<="&MONTH("1-"&H2))
 
Upvote 0
At the moment I see one fundamental reason of your problem: your data is not in the First Normal Form and this causes a lot of troubles.

The First Normal Form in spreadsheets is: "Always store at most one piece of information in one cell".

Your data design violates this principle, by encoding many pieces of information in the "Customer+Product" column.
The greatest impact in terms of speed can be achieved by reformatting your data to satisfy FNF.
So instead of the top form, use tbe bottom form:

Excel 2010
ABC
14CustomerANorthAmerica+Product1
15
16Customer ANorth AmericaProduct1
Sheet3


This will convert your wildcard searches inside of SUMPRODUCT into plain SUMIFS, because, e.g., North America will become a single value in its own cell, which can serve as a criterion in SUMIFS.

If you cannot afford changing your data layout, I have a small idea which might help to a certain extent. This is again about changing the format of your data.
If you replace Monthly values by a cumulative sum, like below:

Excel 2010
ABCDEFGH
1Customer+ProductJanFebMarAprFebAprMay
2ABCDE-12341020304020
3ACD-23859010705010
4
5
6
7Customer+ProductJanFebMarAprFebAprMay
8ABCDE-1234103060100120120120
9ACD-23850090100170220230
Sheet3
Cell Formulas
RangeFormula
B8=B2
B9=B3
C8=B8+C2
C9=B9+C3
D8=C8+D2
D9=C9+D3
E8=D8+E2
E9=D9+E3
F8=E8+F2
F9=E9+F3
G8=F8+G2
G9=F9+G3
H8=G8+H2
H9=G9+H3


I assume that empty cells above are either before the first sale, or after the one year period has ended. This way, when you are preparing your report for April, you just sum the April column, which already contains the row sums you need.

The picture shows how to use simple formulas to convert your present data format into the new one. I assume that you would keep the old format, and create the new one on a separate worksheet, used solely for computing reports.


What do you think?

J.Ty.
 
Upvote 0
Thank you all for the responses, I appreciate the thoughts.

Jonmo1


My column headers are not real dates. So what my formula looks like is:

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Start Track Date (Real Date)
[/TD]
[TD]Column Index of Volume
[/TD]
[TD]Column Index of Revenue
[/TD]
[TD]Column Index of Margin
[/TD]
[/TR]
[TR]
[TD]03/15/2014
[/TD]
[TD]=MATCH(TEXT([@[Start Track Date]],"Mmm YY") & " - " Volume,Table[#Headers],0)
[/TD]
[TD]=MATCH(TEXT([@[Start Track Date]],"Mmm YY") & " - " Revenue,Table[#Headers],0)
[/TD]
[TD]=MATCH(TEXT([@[Start Track Date]],"Mmm YY") & " - " Margin,Table[#Headers],0)
[/TD]
[/TR]
[TR]
[TD]Result
[/TD]
[TD]34
[/TD]
[TD]58
[/TD]
[TD]82
[/TD]
[/TR]
</tbody>[/TABLE]

Since each column in the table must have a unique header label, I cannot put the real date in each column as I would have repeated dates.

Aladin

I considered putting the dates in the rows, but here are my main question/concern is that my table already has 50,000 rows. If I put dates into the rows, I may end up with many hundreds of thousands of rows. Will the SUMIF formulas be any better under those circumstances?

J.Ty.

My data actually is arranged as you suggest; in that each part of the lookup string (Customer+Geography+Product) is in its own column. I've concatenated them together because I thought it would improve the speed of lookups to use only 1 field instead of 3.

Regarding the YTD suggestion... I'll have to think about that. I'm trying to think how that could be done without too much manual intervention (this is a report prepared by users that may not be so Excel savvy). Also my file is about 35 MB currently and if I duplicate my entire data set showing YTD values, I'm concerned it will grow too large to effectively distribute.

I think the approach of putting dates into rows is the most ideal, but I'll have to give it a test.

As a separate point, but one I need to figure out, if I do put the dates into rows, how can I, for a given Customer+Product combination, return the earliest month with a sale? I thought of,

=MIN(IF(Table[Customer+Product] = [@[Customer+Product]],Table[Date in Rows])

as an array... but (1) will that be horribly slow on a several hundred thousand row table and (2) that still doesn't address the wildcard issue?
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top