Unique Count Based on Criteria

mskusace

New Member
Joined
Jan 2, 2019
Messages
18
I have a report that is published monthly and I use the data from this report to generate a sub-report. The data set below is a compilation of 4 months of reports and has overall filter, which I apply when I pull the data together. This is also a truncated and slightly masked data set, but the logic will be consistent.

[TABLE="width: 500"]
<tbody>[TR]
[TD]Period[/TD]
[TD]Month_Rank[/TD]
[TD]Con#[/TD]
[TD]Unique[/TD]
[/TR]
[TR]
[TD]201809[/TD]
[TD]1[/TD]
[TD]CON1[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201809[/TD]
[TD]1[/TD]
[TD]CON2[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201809[/TD]
[TD]1[/TD]
[TD]CON3[/TD]
[TD]Unique[/TD]
[/TR]
[TR]
[TD]201810[/TD]
[TD]2[/TD]
[TD]CON1[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201810[/TD]
[TD]2[/TD]
[TD]CON2[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201810[/TD]
[TD]2[/TD]
[TD]CON4[/TD]
[TD]Unique[/TD]
[/TR]
[TR]
[TD]201811[/TD]
[TD]3[/TD]
[TD]CON2[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201811[/TD]
[TD]3[/TD]
[TD]CON5[/TD]
[TD]Unique[/TD]
[/TR]
[TR]
[TD]201812[/TD]
[TD]4[/TD]
[TD]CON2[/TD]
[TD]Duplicate[/TD]
[/TR]
[TR]
[TD]201812[/TD]
[TD]4[/TD]
[TD]CON6[/TD]
[TD]Unique[/TD]
[/TR]
</tbody>[/TABLE]


Period: Manually entered based on the month the report was pulled.
Month_Rank: Formula calculation =TRUNC(SUMPRODUCT(--([@[Period]] > [Period])/COUNTIF([Period],[Period]))+1,1)
Con#: Data from the monthly report
Unique: Formula calculation =IF(COUNTIF([Con#],[@[Con#]])=1,"Unique","Duplicate")

Assume 201812 is the most current period.

Objective 1: Count the "Con#"s that exists in the previous period AND does NOT exist in the current period.
Example: The count for the table above would be 1. CON5 exists in "Month_Rank" = 3, but not "Month_Rank" = 4. It does not matter if it exists in even earlier periods.

I am having trouble with the logic since it is counting the "Con#" column based on two sets of criteria from "Month_Rank". Any help or guidance would be greatly appreciated.

Please let me know if I need to clarify further.
 

Excel Facts

Can you sort left to right?
To sort left-to-right, use the Sort dialog box. Click Options. Choose "Sort left to right"
Welcome to the board.

I'm having a bit of trouble figuring out your logic. For example, CON1 exists in 2, but not 3. CON4 exists in 2 but not 3. Why aren't these counted? Or do you mean that the count for month 2 is 1 (CON3 in 1 but not 2), the count for month 3 is 2 (CON1 and CON4), and the count for month 4 is 1 (CON5)?
 
Upvote 0
Thank you Eric! I frequent the board often for answers and it looks like I was finally able to get my registration e-mail to work. I hope to become a contributor now as well.

Let's assume CON# are contracts with a finite length of time they span. I want to see how many new contracts we have in the latest month and then also the number of contracts that were closed in the previous month. I ONLY care about the most recent month and the month prior to it. So this list will continue for 201901 to be "Month_Rank" = 5. Meaning for the next report, I will care about the new (unique) contracts in max(Month_Rank) (Month_Rank = 5) and the contracts that exist in Month_Rank = 4 but NOT Month_Rank = 5, meaning it was closed.

I already have the logic for determining the new contracts for the latest month, but I do not have it to calculate the contracts that were closed. I hope that clarifies.
 
Upvote 0
Try:

=SUM(IF(Table1[Month_Rank]=MAX(Table1[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table1[Month_Rank])&"|"&Table1[Con'#],Table1[Month_Rank]&"|"&Table1[Con'#],0)),1)))

confirmed with Control+Shift+Enter.
 
Upvote 0
That is excellent! It appears to be working in all cases! I am trying to decipher the formula now and was hoping to get some insight. I've gone through and F9'd the segments to break it down, but I still have loose ends.

I used another formula for the "Month_Rank", which is a combination of something I found on a forum and some personal tweaks, that I am having a bit of trouble fully understanding as well.

If you have time, would you mind breaking apart either both or either one of the formulas below and explaining it more detail?

={SUM(IF(Table2[Month_Rank]=MAX(Table2[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table2[Month_Rank])&"|"&Table2[Con'#],Table2[Month_Rank]&"|"&Table2[Con'#],0)),1)))}

=TRUNC(SUMPRODUCT(--([@Period] > [Period])/COUNTIF([Period],[Period]))+1,1)
 
Upvote 0
=SUM(IF(Table1[Month_Rank]=MAX(Table1[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table1[Month_Rank])&"|"&Table1[Con'#],Table1[Month_Rank]&"|"&Table1[Con'#],0)),1)))

The part in red finds the max month in the Month_Rank column, and subtracts 1 to get the 2nd to last month.

=SUM(IF(Table1[Month_Rank]=MAX(Table1[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table1[Month_Rank])&"|"&Table1[Con'#],Table1[Month_Rank]&"|"&Table1[Con'#],0)),1)))

This looks for the rows that match the 2nd to last month.

=SUM(IF(Table1[Month_Rank]=MAX(Table1[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table1[Month_Rank])&"|"&Table1[Con'#],Table1[Month_Rank]&"|"&Table1[Con'#],0)),1)))

The part in red creates a key composed of the last month, a pipe symbol, and the contract number from the matching rows we found in the last step. The blue part creates a similarly formatted key from the entire table.

=SUM(IF(Table1[Month_Rank]=MAX(Table1[Month_Rank])-1,IF(ISERROR(MATCH(MAX(Table1[Month_Rank])&"|"&Table1[Con'#],Table1[Month_Rank]&"|"&Table1[Con'#],0)),1)))

Now we look for the matching keys. If they are not found (ISERROR), then the contract number exists in month-1, but not in the last month, so we add
1. Finally, the SUM adds up all the results. We'll either get a 1 from the previously explained process, or a FALSE from either of the 2 IFs. FALSEs are non-numeric, so we're just left with the total.


=TRUNC(SUMPRODUCT(--([@Period] > [Period])/COUNTIF([Period],[Period]))+1,1)

This part counts how many times each row appears in the column, so for the example we get {3,3,3,3,3,3,2,2,2,2}. (Row2, 201809 occurs 3 times, row3, 201809 occurs 3 times, row4 201809 occurs 3 time, row5 208110 occurs 3 times, etc.)

=TRUNC(SUMPRODUCT(--([@Period] > [Period])/COUNTIF([Period],[Period]))+1,1)

Sees which rows are less than the current row. Say we're on row 11. Then the result array is {TRUE,TRUE,TRUE,
TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE}. Now divide this array by the first array (TRUE = 1, FALSE = 0), resulting in {1/3,1/3,1/3,1/3,1/3,1/3,1/2,1/2,0,0}. Now sum this array with the SUM part of SUMPRODUCT. Notice how the 1st 3 rows add up to 1, the next 3 rows add up to 1, the next 2 rows add up to 1. This is the clever part that ends up adding 1 for each set of dates, so the sum is 3. Then we have to add 1 more to get the amount for the current row. The TRUNC is just there to handle rounding issues.

So you can see how it works, and it's pretty clever, but also pretty obscure. If someone else sees it, they might not understand it. Given that the Period column is a date, you could try something like this:

=DATEDIF("08/01/2018",RIGHT([@Period],2)&"/01/"&LEFT([@Period],4),"m")

which just figures out how many months from the starting date. If the values in Period are actually dates, the formula is even easier.

Hope this helps!
 
Upvote 0
That is brilliant! Is that something you have done similarly in the past or that type of logic at least?

So the nested IF statement captures all the contracts that exist in month 1, 2, and 3, but NOT in 4 (sum = 8). Then the outside IF statement filters that down to ONLY be for month 3, the previous month.

I'm still trying to understand how you came about that logic, if it's something that is common practice or just a unique solution.

Definitely for the last one I was having difficulty understanding, but your explanation was great! The first "-" changes the TRUE/FALSE to -1/0 and then the second "-" changes -1/0 to 1/0? Do I need to do that as an array formula with CTRL+SHIFT+ENTER as well or just a normal formula?

Thank you so much!
 
Upvote 0
I'm not sure I could say that this is common practice. The formula as a whole is probably more complicated than 95% (or more) of formulas used. I've certainly never seen or used this exact formula. But it's comprised of pieces that I've used many times. I have a knack for figuring out how to put different pieces together to get the final desired result.

You have the right idea about the -- (or "double unary"), it converts a Boolean (TRUE/FALSE) value to 1/0. When you use TRUE/FALSE in an equation, Excel coerces it into a value. You can use it in a regular formula, or an array formula, it depends on what you're doing. Incidentally, you don't need it at all in the particular formula you're looking at. Any mathematical operation will convert it, and in this case when we divide the numerator array by the denominator array, that handles it.

There's an even easier formula for that column than the DATEDIF I came up with:

=N(B1)+(A2<>A1)

It also converts a Boolean to a number, in a non-array formula. I haven't figured out a good way to use table references with it though.

Glad to help!
 
Upvote 0
I used to think I was an advanced Excel user, but now I definitely feel like I have a lot to learn. I have never used the N function before, but that is very good to know. I will have to explore more and hopefully I can get into more of these complex functions as well. Sometimes I know the logic I would like to apply, I just don't know how in Excel.

This is a off-topic, but I hope that is okay. Do you do any of your data manipulation in SQL? If so, are you able to execute these same types of manipulation easily and would you recommend it? I am trying to find the most efficient way to handle small and large data sets with data manipulation and automation. Sometimes the data sets are large and take a bit longer to run in Excel because I don't know best practices for VBA.
 
Upvote 0
I've used Excel for decades, and I'm still learning new things, not to mention new ways to use things I already know.

As far as your other question, Yes I do know SQL, and yes you can use it for this kind of data manipulation. There are ways to use it directly from Excel. There is also a relatively recent addition to Excel called Power Query that can do some pretty powerful data manipulation. Outside of Excel there are many database programs that you can use. Deciding what tool to use can be a tricky problem.

The latest version of Excel has a million lines per sheet. But if you really have a million lines of data in Excel, you're using the wrong tool. You could use Access and use Excel as a front-end. Or there are many other options. You need to look at how much data you have, how is it updated, what you do with it, how much will it grow, etc. That's really a bigger question than we can answer here.

Good luck!
 
Upvote 0

Forum statistics

Threads
1,223,904
Messages
6,175,295
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top