Excel formula for counting unique values with many multiple conditions

Donny123

New Member
Joined
Nov 2, 2018
Messages
5
Hi,

Is there an elegant formula way to count unique values or text based on multiple conditions. Countifs formula works but it does not count unique values, it counts all records based on # of records in the data set. Based on research I've seen some examples using arrays and sumproduct - it doesn't appear to be working...maybe I'm messing up on the syntax. Any help would be much appreciated! Thank you in advance!

Here is a sample data set:
[TABLE="width: 1000"]
<colgroup><col><col><col><col><col><col><col><col><col><col><col><col></colgroup><tbody>[TR]
[TD]City[/TD]
[TD]Segment Name[/TD]
[TD]Sub-Segment Name[/TD]
[TD]Transaction ID[/TD]
[TD]Country_Name[/TD]
[TD]Region[/TD]
[TD]C_Status[/TD]
[TD]Tier[/TD]
[TD]Bus_Segment[/TD]
[TD]Period[/TD]
[TD]Revenue[/TD]
[TD]Customer ID[/TD]
[/TR]
[TR]
[TD]Mexico City[/TD]
[TD]Technology[/TD]
[TD]Software[/TD]
[TD="align: right"]1234567[/TD]
[TD]Mexico[/TD]
[TD]LATAM[/TD]
[TD]No[/TD]
[TD]MME[/TD]
[TD]NA[/TD]
[TD]2018[/TD]
[TD="align: right"]1[/TD]
[TD="align: right"]1234[/TD]
[/TR]
[TR]
[TD]Mexico City[/TD]
[TD]Technology[/TD]
[TD]Software[/TD]
[TD="align: right"]12345[/TD]
[TD]Mexico[/TD]
[TD]LATAM[/TD]
[TD]Yes[/TD]
[TD]MME[/TD]
[TD]NA[/TD]
[TD]2018[/TD]
[TD="align: right"]2[/TD]
[TD="align: right"]1234[/TD]
[/TR]
[TR]
[TD]Mexico City[/TD]
[TD]Technology[/TD]
[TD]Software[/TD]
[TD="align: right"]24234[/TD]
[TD]Mexico[/TD]
[TD]LATAM[/TD]
[TD]No[/TD]
[TD]SME[/TD]
[TD]NA[/TD]
[TD]2018[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]345[/TD]
[/TR]
[TR]
[TD]Mexico City[/TD]
[TD]Technology[/TD]
[TD]Hardware

[/TD]
[TD="align: right"]3423423423[/TD]
[TD]Mexico[/TD]
[TD]LATAM[/TD]
[TD]No[/TD]
[TD]SME[/TD]
[TD]NA[/TD]
[TD]2018[/TD]
[TD="align: right"]4[/TD]
[TD="align: right"]345[/TD]
[/TR]
</tbody>[/TABLE]

I have named each of the above columns with name ranges for ease of reference. Below is the formula using countifs - it works but it gives me a count of number of records (in this case the lowest denominator is transaction ID), I want a formula which will give me a unique count of Customer_ID (e.g.Customer) based on a set of variables, with hopefully the ability to specify * for all in a particular parameter.

=COUNTIFS(Country,$P30,BusSegment,$C$7,Tier,$C$6,Period,AF$8,Region,$N30,SubRegion,$O30,Segment,$L30,SubSegment,$M30, C_Status,"*")
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Welcome to the forum.

Without the "*" requirement, you could use a formula like this:

Code:
=SUM(SIGN(FREQUENCY(IF((Country=P30)*(BusSegment=C7)*(Tier=C6)*(Period=AF8)*(Region=N30)*(Segment=L30)*(SubSegment=M30)*(C_Status=Q30),Customer_ID),Customer_ID)))

confirmed with Control+Shift+Enter.

Note that this doesn't have SubRegion, since that wasn't on your sample data, and C_Status is looking at Q30 for the value. Also note that this formula requires the Customer_ID to be numeric. But given those caveats, you should be able to see how to adapt it for your use. Use the $ as needed on your cell values.

Adding the "*" requirement isn't especially hard, but it really lengthens the formula and makes it harder to read:

Code:
=SUM(SIGN(FREQUENCY(IF(((Country=P30)+(P30="*"))*((BusSegment=C7)+(C7="*"))*((Tier=C6)+(C6="*"))*((Period=AF8)+(AF8="*"))*((Region=N30)+(N30="*"))*((Segment=L30)+(L30="*"))*((SubSegment=M30)+(M30="*"))*((C_Status=Q30)+(Q30="*")),Customer_ID),Customer_ID)))

also with Control+Shift+Enter. Again, you should see how to adapt it. If you don't want the "*" option for some parameters, you can just leave them like the first formula. If your customer IDs are not all numeric, let me know and I'll show you how to change the formula.
 
Last edited:
Upvote 0
Thank you so much Eric! Really appreciated! I forgot to mentioned I'm doing this for a very large data set - my data range is over 400,000 records. So when I press ctl+shift+enter - excel just goes unresponsive and spins - CPU goes up but not 100% like when calculating other formulas on that data set - so not sure if it's a formula issue or just this formula doesn't work with large data sets? Thoughts?
 
Upvote 0
Yeah, that sounds like a data issue, just too many rows for that formula to handle. You might be better off with a macro. You could set it up so that when you change any of the parameter cells, it kicks off, calculates the result and places it in a cell of your choice.

If that's of interest, we'd have to figure out which are the parameter cells, and where to put the result based on the input cells. I noticed in your original formula that the parameters were all over, and some with $, some not.
 
Upvote 0
Yeah, with this many rolls the formulas calcs after 2-3 min, but it returns zero...maybe it is the wildcard which is causing an issue. I actually already have a loop macro to generate certain items and can easily pass the parameters and the placement of the results - how would the VBA array formula look like, I've never done this before :)
 
Upvote 0
Well, I wouldn't use a formula in VBA. If it's too slow in Excel, it'll probably be too slow in VBA. Instead I'd read the data, run a loop that checks each row, if the row is valid add the customer number to a dictionary (which is a great tool for counting unique values), and return the value at the end. I don't have your sheet details, but the code would look something like:

Code:
    lr = Cells(Rows.Count, "A").End(xlUp).Row
    
    MyData = Range("A2:L" & lr).Value
    Set MyDict = CreateObject("Scripting.Dictionary")
    
    For i = 1 To UBound(MyData)
    
        If MyData(i, 1) = "Mexico City" And _
           MyData(i, 2) = "Technology" And _
           MyData(i, 3) = "Software" Then
           MyDict(MyData(i, 12)) = 1
        End If
    Next i
    
    Debug.Print MyDict.Count
You need to update the code. The If statement needs to handle all of your fields, you need to pass the values as parameters, etc. Note that I read your entire data range in one line. Reading through an internal array is MUCH faster than reading from the sheet for each cell.

I wouldn't recommend writing it as a UDF (user defined function) since even in VBA it will be somewhat slow, and you don't want it auto-executing all the time. Set up a worksheet change event and execute it when your parameters change.
 
Upvote 0
Wow - this things is fast! But I'm getting zero as output, not sure why. I've changed certain things around to fit the data. I think it might have to do with the () around wildcard (*) statements? I was getting a return value before putting those in but getting the wrong amount - I couldn't figure out why... Also, myDict.count - this will return number of records, how do I make it so the count return is unique Here is the function:
Function L3_Count(Credit_Indicator, Year)

Sheets("Industry2").Select
Region = Sheets("Industry2").Cells.Range("C1").Value
Sub_Region = Sheets("Industry2").Cells.Range("C2").Value
Country = Sheets("Industry2").Cells.Range("C3").Value
Tier = Sheets("Industry2").Cells.Range("C6").Value
Business_Segment = Sheets("Industry2").Cells.Range("C7").Value
industry = Sheets("Industry2").Cells.Range("C4").Value
sub_segment = Sheets("Industry2").Cells.Range("M6").Value


lr = Sheets("Data").Cells(Rows.Count, "A").End(xlUp).Row

MyData = Sheets("Data").Range("A2:L" & lr).Value
Set MyDict = CreateObject("Scripting.Dictionary")

For i = 1 To UBound(MyData)

If ((MyData(i, 1) = Sub_Region) Or (Sub_Region = "*")) And _
((MyData(i, 2) = industry) Or (industry = "*")) And _
((MyData(i, 3) = sub_segment) Or (sub_segment = "*")) And _
((MyData(i, 5) = Country) Or (Country = "*")) And _
((MyData(i, 6) = Region) Or (Region = "*")) And _
((MyData(i, 7) = Credit_Indicator) Or (Credit_Indicator = "*")) And _
((MyData(i, 8) = Tier) Or (Tier = "*")) And _
((MyData(i, 9) = Business_Segment) Or (Business_Segment = "*")) And _
(MyData(i, 10) = Year) Then
MyDict(MyData(i, 12)) = 1
End If
Next i

'Debug.Print MyDict.Count
L3_Count = MyDict.Count

End Function
 
Upvote 0
It looks like you coded the macro properly. Without seeing your data, I can't tell why it's not working. Try debugging it. Put a breakpoint on the first executable line (put the cursor on it and press F9). Kick off the macro by re-entering the function on the spreadsheet. Use the F8 key to single-step through the macro. See if lr (last row) gives you the proper last row of your data. Pick a row that you think should be counted. When you get to that row, hover the mouse over each of the fields in the If statement to see which ones match/don't match.

As far as counting unique accounts instead of all accounts, the dictionary object does just that. Here's a nice link explaining dictionaries:

https://excelmacromastery.com/vba-dictionary/

But in short, a dictionary is a set of keys with matching items. Each key must be unique. If I tried to use

MyDict.Add MyData(i, 12), 1

then I would get an error when I tried to add a key for the second time. I could get around that by putting "On Error Resume Next" before the add line, or even check to see if it exists with

If Not MyDict.Exists(MyData(i, 12)) Then MyDict.Add MyData(i, 12), 1

But a third way is just to set the item value for a given key, which is what I did. If the key doesn't exist, then the key is created. If it does exist, then the item value is overwritten. So the MyDict.Count is a count of unique values.
 
Upvote 0
Eric - you are a genius! A true work of a master - works like a charm - fast and accurate! Thank you so so much - I've been trying to make an accurate unique records count in Excel and Access for the longest time! The previous code had a minor issue as one of the variables was being passed as integer as opposed to string - once fixed the function worked! For other people's knowledge, to calculate a UDF as part of data change/F9 etc...use Application.Volatile True inside the function. Cheers! :)
 
Upvote 0
Nice catch. It can be tricky finding a string vs. integer comparison issue. You can often prevent that kind of thing by always defining your variables at the start of the macro.

Dim Region as String, Sub_Region as String

etc. You might get a little more speed out of it too, since undefined variables are implicitly defined as Variant, and VBA constantly has to do type conversions. Also a good tip about the Application.Volatile. Glad I could help! :cool:
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,180
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top