Counting substring matches based on criteria. Is there a better/faster way?

jahsquare

Board Regular
Joined
Jan 22, 2014
Messages
51
Hi, I have the following formula running in Table1 to search for a names from Table2. The formula is running in 20 columns across thousands of records, and it takes about 30 minutes to complete..

=COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),($D4&" *"),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),("* "&$D4),INDIRECT("Table2["&$L$1&"]"),J$3)+
COUNTIFS(INDIRECT("Table2["&$J$1&"]"),$D4,INDIRECT("Table2["&$L$1&"]"),J$3)

Table2 Col J is a list of strings comprising concatenated names.
Table2 Col L is a "Year" criteria (this formula runs across 20 columns / 20 years)

Each value in Table1 Col D has a single name that may be at the beginning, middle, or end of any of the strings in Table2 Col. J.

I have tried two other solutions but I can't get them to work, and i'm not sure they'd be faster anyway:

1) =COUNTIFS(INDIRECT("Table2["&$J$1&"]"),{("* "&$D4&" *"),($D4&" *"),("* "&$D4),$D4},INDIRECT("Table2["&$L$1&"]"),J$3)

with this I get an error and can't run the formula

2) =SUMPRODUCT(
(INDIRECT("Table2["&$J$1&"]")=("* "&$D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=($D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=("* "&$D4)) +
INDIRECT("Table2["&$J$1&"]")=($D4))) * INDIRECT("Table2["&$L$1&"]")=J$3)

This runs but the values are wrong (much lower than expected)


Any help is much appreciated. Thanks
 

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
What happens if you try like;

=SUMPRODUCT(--INDIRECT("Table2["&$L$1&"]")=J$3,
INDIRECT("Table2["&$J$1&"]")=("* "&$D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=($D4&" *")) +
INDIRECT("Table2["&$J$1&"]")=("* "&$D4)) +
INDIRECT("Table2["&$J$1&"]")=($D4)))
 
Last edited by a moderator:
Upvote 0
Suggest you post some sample data that can be copied to Excel. Inputs & outputs/answer.
The existing description may be adequate, but if you think it will help get a better answer, please provide more description.
regards
 
Upvote 0
[TABLE="class: grid, width: 500, align: left"]
<tbody>[TR]
[TD][/TD]
[TD]A[/TD]
[TD]B[/TD]
[TD]C[/TD]
[TD]D[/TD]
[TD]E[/TD]
[TD]F[/TD]
[TD]G[/TD]
[TD]H[/TD]
[TD]I[/TD]
[TD]J[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]Name[/TD]
[TD]2012[/TD]
[TD]2013[/TD]
[TD]2014[/TD]
[TD]2015[/TD]
[TD]2016[/TD]
[TD]2017[/TD]
[TD][/TD]
[TD]Names[/TD]
[TD]Year[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]Foo[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]0[/TD]
[TD][/TD]
[TD]Foo Bar[/TD]
[TD]2012[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]Bar[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD][/TD]
[TD]Bif Baz[/TD]
[TD]2013[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]Bif[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]0[/TD]
[TD][/TD]
[TD]Foo Baz[/TD]
[TD]2014[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]Baz[/TD]
[TD]0[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]1[/TD]
[TD][/TD]
[TD]Foo[/TD]
[TD]2015[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Bif Foo Baz[/TD]
[TD]2016[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Bar[/TD]
[TD]2017[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Bif[/TD]
[TD]2012[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Bar Foo[/TD]
[TD]2013[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Bif Bar[/TD]
[TD]2014[/TD]
[/TR]
[TR]
[TD]11[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Baz Bif Foo Bar[/TD]
[TD]2015[/TD]
[/TR]
[TR]
[TD]12[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Foo Bar Bif Baz[/TD]
[TD]2016[/TD]
[/TR]
[TR]
[TD]13[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]Baz[/TD]
[TD]2017[/TD]
[/TR]
[TR]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]



Here is some sample data. I1:J13 -> Table7

And an updated version of the formula for B2:G5

=COUNTIFS(INDIRECT("Table7["&$I$1&"]"),("* "&$A2&" *"),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),($A2&" *"),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),("* "&$A2),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),$A2,INDIRECT("Table7["&$J$1&"]"),B$1)
 
Last edited:
Upvote 0
Try - using your data sample in post 4.

Formula in K2 copied down
=SUMPRODUCT(--ISNUMBER(SEARCH(A$2:A$5,I2)),INDEX(B$2:G$5,0,MATCH(J2,$B$1:$G$1,0)))

Hope this helps

M.
 
Upvote 0
Or

If the range A1:G5 is formatted as a Table named Table2, try
Formula in K2 copied down
=SUMPRODUCT(--ISNUMBER(SEARCH(Table2[Name],I2)),INDEX(Table2,0,MATCH(J2&"",Table2[#Headers],0)))

M.
 
Upvote 0
Marcelo, apologies for being unclear - B2:G5 is my expected output, what I'm trying to do is replace the current formula with something more efficient, as it takes a very long time to update.

Flashbond, I am getting a #REF error.
[here is your formula updated for the sample data I posted above]
=SUMPRODUCT(--INDIRECT("Table2["&$J$1&"]")=B$1,
INDIRECT("Table7["&$I$1&"]"="* "&$D4&" *") +
INDIRECT("Table7["&$I$1&"]"=$D4&" *") +
INDIRECT("Table7["&$I$1&"]"="* "&$D4) +
INDIRECT("Table7["&$I$1&"]"=$D4))
 
Last edited:
Upvote 0
Marcelo, apologies for being unclear - B2:G5 is my expected output, what I'm trying to do is replace the current formula with something more efficient, as it takes a very long time to update.

Trying to understand what you need
Are the Names in A2:A5 given or do you need a formula (or macro) to extract them of the values in I2:I13?
In other words is the below data setup given?

[Table="class: grid"][tr][td="bgcolor: #DCE6F1"][/td][td="bgcolor: #DCE6F1"]
A
[/td][td="bgcolor: #DCE6F1"]
B
[/td][td="bgcolor: #DCE6F1"]
C
[/td][td="bgcolor: #DCE6F1"]
D
[/td][td="bgcolor: #DCE6F1"]
E
[/td][td="bgcolor: #DCE6F1"]
F
[/td][td="bgcolor: #DCE6F1"]
G
[/td][/tr]
[tr][td="bgcolor: #DCE6F1"]
1
[/td][td]
Name​
[/td][td]
2012​
[/td][td]
2013​
[/td][td]
2014​
[/td][td]
2015​
[/td][td]
2016​
[/td][td]
2017​
[/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
2
[/td][td]
Foo​
[/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
3
[/td][td]
Bar​
[/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
4
[/td][td]
Biff​
[/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
5
[/td][td]
Baz​
[/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][td="bgcolor:#D9D9D9"][/td][/tr]
[/table]


If so, do you need a formula to fill B2:G5 (gray area)?

M.
 
Upvote 0
That's correct. Currently in B2:G5, I'm using the formula I provided above:

=COUNTIFS(INDIRECT("Table7["&$I$1&"]"),("* "&$A2&" *"),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),($A2&" *"),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),("* "&$A2),INDIRECT("Table7["&$J$1&"]"),B$1)+
COUNTIFS(INDIRECT("Table7["&$I$1&"]"),$A2,INDIRECT("Table7["&$J$1&"]"),B$1)

But because my dataset is very large, this takes a very long time to complete (and occasionally crashes excel). I was hoping someone could suggest a more efficient way. Like it seems that there should be a better way for me to do exact substring searches in stead of using 4 different statements OR'd together as I'm doing now.
 
Last edited:
Upvote 0
* Should also note that the simplified example doesn't provide that some of the names in I2:I13 might have suffices that I do NOT want to match. I.e. searching 'baz' in A2:A5 should return false for 'bazzy' in I2:I13.
 
Upvote 0

Forum statistics

Threads
1,223,898
Messages
6,175,274
Members
452,628
Latest member
dd2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top