Distinct Result from Multiple Countif

Ravin

Board Regular
Joined
Aug 24, 2012
Messages
67
all, I need help

I have 3 columns of data

store id, store name and item no

if I create a Pivot by store name and count of item no

I get 250 store names and counts

this works

the problem is I am trying to perform a distinct count of SKUs across 5 stores

I can do this in a pivot using distinct count and selecting the 5 stores in the Pivot

the issue I have is that the list I am working from has 5000 rows with 5 stores each

Can I either do this via a Pivot

I have tried arrays but it is incredibly slow

here is an example of the data

Store Id Store Name item no
238 St Albans 100034

397 Watford 100034
238 St Albans 100103
397 Watford 100103
238 St Albans 100269
238 St Albans 100324
397 Watford 100324
238 St Albans 100803
397 Watford 100803

I'm basically doing a pivot

where I am saying

store 238 has 5 items
store 397 has 4 items

what I want is a formula that

tells me the distinct occurrences across both stores

so I would define "st albans" and "watford" and the distinct count would be 5

equally I have to lookup from the text "st albans"

Can you help please

R
 
Something like this


[TABLE="class: grid"]
<tbody>[TR]
[TD="bgcolor: #DCE6F1"][/TD]
[TD="bgcolor: #DCE6F1"]
A
[/TD]
[TD="bgcolor: #DCE6F1"]
B
[/TD]
[TD="bgcolor: #DCE6F1"]
C
[/TD]
[TD="bgcolor: #DCE6F1"]
D
[/TD]
[TD="bgcolor: #DCE6F1"]
E
[/TD]
[TD="bgcolor: #DCE6F1"]
F
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
1
[/TD]
[TD]
Store id​
[/TD]
[TD]
Store Name​
[/TD]
[TD]
no​
[/TD]
[TD][/TD]
[TD]
Criteria​
[/TD]
[TD]
Result​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
2
[/TD]
[TD]
238​
[/TD]
[TD]
St Albans​
[/TD]
[TD]
100034​
[/TD]
[TD][/TD]
[TD]
St Albans​
[/TD]
[TD]
5​
[/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
3
[/TD]
[TD]
397​
[/TD]
[TD]
Watford​
[/TD]
[TD]
100034​
[/TD]
[TD][/TD]
[TD]
Watford​
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
4
[/TD]
[TD]
238​
[/TD]
[TD]
St Albans​
[/TD]
[TD]
100103​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
5
[/TD]
[TD]
397​
[/TD]
[TD]
Watford​
[/TD]
[TD]
100103​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
6
[/TD]
[TD]
238​
[/TD]
[TD]
St Albans​
[/TD]
[TD]
100269​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
7
[/TD]
[TD]
238​
[/TD]
[TD]
St Albans​
[/TD]
[TD]
100324​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
8
[/TD]
[TD]
397​
[/TD]
[TD]
Watford​
[/TD]
[TD]
100324​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
9
[/TD]
[TD]
238​
[/TD]
[TD]
St Albans​
[/TD]
[TD]
100803​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
10
[/TD]
[TD]
397​
[/TD]
[TD]
Watford​
[/TD]
[TD]
100803​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
11
[/TD]
[TD]
444​
[/TD]
[TD]
xxx​
[/TD]
[TD]
100034​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
12
[/TD]
[TD]
555​
[/TD]
[TD]
yyy​
[/TD]
[TD]
100034​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
13
[/TD]
[TD]
444​
[/TD]
[TD]
xxx​
[/TD]
[TD]
100103​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
14
[/TD]
[TD]
555​
[/TD]
[TD]
yyy​
[/TD]
[TD]
100103​
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="bgcolor: #DCE6F1"]
15
[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]


Criteria in E2:E3

Array Formula in F2
=SUM(IF(FREQUENCY(IF(B2:B30000<>"",IF(ISNUMBER(MATCH(B2:B30000,E2:E3,0)),C2:C30000)),C2:C30000),1))
Ctrl+Shift+Enter

Observe i used ranges with almost 30K rows

M.

Hi Marcelo

I have stuck with your solution as it works the best.

=SUM(IF(FREQUENCY(IF($B$2:$B$537802<>"",IF(ISNUMBER(MATCH($B$2:$B$537802,$G3:$K3,0)),$C$2:$C$537802)),$C$2:$C$537802),1))

I have a additional question If I wanted to add a second criteria to the lookup part,

1st was to look at G3:K3 in column B
2nd is to Look at L3 in column D

L3 is also Text
and values in Column D are also text

before performing the distinct element, how would I do this.

R
 
Upvote 0

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
1st was to look at G3:K3 in column B
2nd is to Look at L3 in column D

L3 is also Text
and values in Column D are also text

I can't see any data in column D (???)
Please, try to provide a data sample with data in column D, as you did in post 1, criteria and expected result.

M.
 
Upvote 0
Here is an example of the data

Store Id Store Name item no Category
238 St Albans 100034 Doors

397 Watford 100034 Doors
238 St Albans 100103 Timber
397 Watford 100103 Timber
238 St Albans 100269 Paint
238 St Albans 100324 Woodcare
397 Watford 100324 Woocare
238 St Albans 100803 Doors
397 Watford 100803 Doors

I'm basically doing a pivot

where I am saying

store 238 has 5 items
store 397 has 4 items

what I want is a formula that

tells me the distinct occurrences across both stores

so I would define "st albans" and "watford" and the distinct count would be 5

equally I have to lookup from the text "st albans"

the secondary criteria will be "doors"

so the distinct count will now be 2

R









 
Upvote 0
Maybe... (using rows 2:537802 as in your post 11)


[Table="class: grid"][tr][td="bgcolor: #DCE6F1"][/td][td="bgcolor: #DCE6F1"]
A
[/td][td="bgcolor: #DCE6F1"]
B
[/td][td="bgcolor: #DCE6F1"]
C
[/td][td="bgcolor: #DCE6F1"]
D
[/td][td="bgcolor: #DCE6F1"]
E
[/td][td="bgcolor: #DCE6F1"]
F
[/td][td="bgcolor: #DCE6F1"]
G
[/td][td="bgcolor: #DCE6F1"]
H
[/td][/tr]
[tr][td="bgcolor: #DCE6F1"]
1
[/td][td]
StoreId​
[/td][td]
StoreName​
[/td][td]
itemno​
[/td][td]
Category​
[/td][td][/td][td]
StoreName​
[/td][td]
Category​
[/td][td]
Result​
[/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
2
[/td][td]
238​
[/td][td]
St Albans​
[/td][td]
100034​
[/td][td]
Doors​
[/td][td][/td][td]
St Albans​
[/td][td]
Doors​
[/td][td]
2​
[/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
3
[/td][td]
397​
[/td][td]
Watford​
[/td][td]
100034​
[/td][td]
Doors​
[/td][td][/td][td]
Watford​
[/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
4
[/td][td]
238​
[/td][td]
St Albans​
[/td][td]
100103​
[/td][td]
Timber​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
5
[/td][td]
397​
[/td][td]
Watford​
[/td][td]
100103​
[/td][td]
Timber​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
6
[/td][td]
238​
[/td][td]
St Albans​
[/td][td]
100269​
[/td][td]
Paint​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
7
[/td][td]
238​
[/td][td]
St Albans​
[/td][td]
100324​
[/td][td]
Woodcare​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
8
[/td][td]
397​
[/td][td]
Watford​
[/td][td]
100324​
[/td][td]
Woocare​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
9
[/td][td]
238​
[/td][td]
St Albans​
[/td][td]
100803​
[/td][td]
Doors​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
10
[/td][td]
397​
[/td][td]
Watford​
[/td][td]
100803​
[/td][td]
Doors​
[/td][td][/td][td][/td][td][/td][td][/td][/tr]

[tr][td="bgcolor: #DCE6F1"]
11
[/td][td][/td][td][/td][td][/td][td][/td][td][/td][td][/td][td][/td][td][/td][/tr]
[/table]


Criteria in columns F:G

Array Formula in H2
=SUM(IF(FREQUENCY(IF(C2:C537802<>"",IF(D2:D537802=G2,IF(ISNUMBER(MATCH(B2:B537802,F2:F3,0)),C2:C537802))),C2:C537802),1))
Ctrl+Shift+Enter

M.
ps:not sure about the performance
 
Upvote 0
With 600,000 records, I'm more sure about the performance, Marcelo.

If speed is needed, consider a database type approach. Such as pivot table or query.
 
Upvote 0
With 600,000 records, I'm more sure about the performance, Marcelo.

If speed is needed, consider a database type approach. Such as pivot table or query.

Fazza,
I strongly agree, but as OP said...

Hi Marcelo

I have stuck with your solution as it works the best.

So, I'm just trying to meet what he asked, ie include a new condition - with caveats: see post 6 and ps in post 14

M.
 
Last edited:
Upvote 0
Thanks, Marcelo.

If it is as simple as it looks (even with another field or two or three, etc), it is trivial without formulas and takes a few seconds I'd expect.

I'm expecting with the proposed formulas it could take a very long time. Maybe hours. Hope I'm wrong about that duration. Hard for me to understand how that can be the best solution.

regards, Fazza
 
Upvote 0
Fazza,

A macro dealing with 550K rows took 6.0221 seconds to do the job.

Code:
Sub aTest()
    Dim arrId As Variant, sCat As String
    Dim dic As Object, vData As Variant, i As Long
    Dim StartTime As Double, ElapsedTime As Double
    
    StartTime = Timer
    arrId = Array(238, 397)
    sCat = "Doors"
    
    Set dic = CreateObject("Scripting.Dictionary")
    dic.CompareMode = vbTextCompare
    
    'vdata has 550000 rows
    vData = Range("A2:D" & Cells(Rows.Count, "A").End(xlUp).Row)
    
    For i = 1 To UBound(vData, 1)
        If Not IsError(Application.Match(vData(i, 1), arrId, 0)) Then
            If vData(i, 4) = sCat Then dic(vData(i, 1)) = Empty
        End If
    Next i
        
    Range("H2") = dic.Count
    ElapsedTime = Timer - StartTime
    With Range("I2")
        .Value = ElapsedTime
        .NumberFormat = "0.0000"
    End With    
End Sub

I'm curious - how would you use SQL in this case to do the counting?

M.
 
Upvote 0
New version - much faster, took only 0.5688 seconds to do the job!

Code:
Sub xTest()
    Dim sCat As String
    Dim dic As Object, vData As Variant, i As Long
    Dim StartTime As Double, ElapsedTime As Double
    
    StartTime = Timer
    sCat = "Doors"
    
    Set dic = CreateObject("Scripting.Dictionary")
    dic.CompareMode = vbTextCompare
    
    'vdata has 550000 rows
    vData = Range("A2:D" & Cells(Rows.Count, "A").End(xlUp).Row)
    
    For i = 1 To UBound(vData, 1)
        Select Case vData(i, 1)
            Case 238, 397 'IDs of interest
                If vData(i, 4) = sCat Then dic(vData(i, 1)) = Empty
        End Select
    Next i
        
    Range("H2") = dic.Count
    ElapsedTime = Timer - StartTime
    With Range("I2")
        .Value = ElapsedTime
        .NumberFormat = "0.0000"
    End With    
End Sub

M.
 
Upvote 0
hi, Marcelo

I'd probably make a query. Though the pivot table would be fine too.

With under a second for the code execution, I guess the code is faster than a pivot table. Though the code seems to be filtering for 238 & 397 so if that was done in the query, it might be similar time.

regards
 
Upvote 0

Forum statistics

Threads
1,225,628
Messages
6,186,103
Members
453,337
Latest member
fiaz ahmad

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top