Extract Unique Records Based on Date Instances

legalhustler

Well-known Member
Joined
Jun 5, 2014
Messages
1,214
Office Version
  1. 365
Platform
  1. Windows
Hi,

I'm looking for a dynamic spilled array formula (and/or even a Power Query solution) to extract unique ID records based on the Date column.

The first two records for ID 123 would be considered one unique record because they ordered milk on 2/5/24 and received it on 2/7/24 (the subsequent date is always considered the received date), a complete order. The last two records for ID 123 would be considered another unique record because ID 123 ordered milk again on 2/10/24 and received it on 2/11/24. Eggs and Apple for ID 123 and 777, respectively, were ordered but not received so they would also be considered unique records (just not a complete order).

DateIDAmountFood
2/5/24123$100Milk
2/7/24123$100Milk
2/5/24123$100Eggs
2/5/24888$50Cookies
2/6/24888$50Cookies
2/12/24777$100Apple
2/10/24123$100Milk
2/11/24123$100Milk

Here is what the result should look like:

IDAmountFood
123$100Milk
123$100Eggs
888$50Cookies
123$100Milk
777$75Apple
 
All valid points. If you do try it, let me know how it performs. I've heard varying versions of just how bad OFFSET is, and it would be interesting to see how much difference there is. It's easy enough to use XMATCH to find the columns, but the OFFSET is integral to the functioning of the formula. I could actually rewrite the OFFSETs with INDEX:INDEX, but that is also supposed to be volatile.
Formula performance was not too bad on my data set, although my data didn't wasn't as big as I expected (less than 200K rows). I did amend your formula to make it more dynamic (the bold parts below) by adding headers and finding the positions of the columns. Can you add the date ordered and date received to the mix? See my previous post to the other member.

=VSTACK({"ID","Amount","Food"},LET(t,A2:E9,cols,XMATCH({"ID","Amount","Food"},A1:E1),ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,s,SEQUENCE(ROWS(t)),c,COUNTIFS(OFFSET(t,0,ca,s,1),OFFSET(t,0,ca,,1),OFFSET(t,0,cb,s,1),OFFSET(t,0,cb,,1),OFFSET(t,0,cc,s,1),OFFSET(t,0,cc,,1)),f,FILTER(t,MOD(c,2)),INDEX(f,SEQUENCE(ROWS(f)),cols)))
 
Upvote 0

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
As for the latter part, it could be resolved by inserting CHOOSECOLS function:

=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,CHOOSECOLS(B1:D9,1,3,5),join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")

If we suppose that there are 5 columns, i.e. ID - column_x - Amount - column_y - Food, then 1,3,5 will represent ID, Amount, and Food.

As for the former part, it probably could but by somebody else :) What it does is: a) creates an array of interest, b) joins text within individual rows, c) compares two adjacent rows, d) replaces duplicate row with empty row and preserves the rest, e) filters out empty rows, f) splits the result into columns. Like I said above, it is not elegant and probably not the best solution...
The Amount column for the formula result does not show the correct number formatting of my data, even if I try to format the cells with number formatting. For example, the amount result shows like 5500750.25 without the comma and "$" sign. It should show proper number formatting like $5,500,750.25. Anyway we can fix this?
 
Upvote 0
Maybe the result is not a number? Try to add +0 to the end of the formula and see whether it helps...
 
Upvote 0
Formula performance was not too bad on my data set, although my data didn't wasn't as big as I expected (less than 200K rows). I did amend your formula to make it more dynamic (the bold parts below) by adding headers and finding the positions of the columns. Can you add the date ordered and date received to the mix? See my previous post to the other member.

=VSTACK({"ID","Amount","Food"},LET(t,A2:E9,cols,XMATCH({"ID","Amount","Food"},A1:E1),ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,s,SEQUENCE(ROWS(t)),c,COUNTIFS(OFFSET(t,0,ca,s,1),OFFSET(t,0,ca,,1),OFFSET(t,0,cb,s,1),OFFSET(t,0,cb,,1),OFFSET(t,0,cc,s,1),OFFSET(t,0,cc,,1)),f,FILTER(t,MOD(c,2)),INDEX(f,SEQUENCE(ROWS(f)),cols)))
Adding the order date is easy, just add "Order Date" to your first array, and "Date" to the second array. The delivery date is proving to be a tough nut to crack. I'll keep playing around with it and see what I come up with. I have Excel 2021, which does not have BYROW or VSTACK or a few other functions, which might be necessary to get it to work.

Also, do you want the results to be formatted? The results are formatted according to the existing formatting of the column/range that the data is in. If you want something like a date to be formatted within the formula, it can be done with TEXT, but the results will be text and can't be used in other formulas easily.
 
Upvote 0
Maybe the result is not a number? Try to add +0 to the end of the formula and see whether it helps...
I tried adding a +0 using the updated formula (see below) which formatted the Amount column but the rest of the result returned a value error. Also when I try to highlight the amount column it gives me a count instead of a sum in the bottom right corner (before adding the +0). It appears the amount column is a text when it results but the raw data is a number....

=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,CHOOSECOLS(B1:F9,XMATCH("ID",B1:F1),XMATCH("Amount",B1:F1),XMATCH("Food",B1:F1)),join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")+0
 
Upvote 0
I managed it with a helper column next to the main table. A bit awkward, but it seems to work:

Book1
ABCDEFGHIJKL
1DateIDAmountSomethingFoodHelpIDAmountFoodOrder DateDelivery Date
22/5/2024123100aMilk1123100Milk2/5/20242/7/2024
32/7/2024123100bMilk5123100Eggs2/5/2024
42/5/2024123100cEggs#N/A88850Cookies2/5/20242/6/2024
52/5/202488850dCookies1777100Apple2/12/2024
62/6/202488850eCookies#N/A123100Milk2/10/20242/11/2024
72/12/2024777100fApple#N/A
82/10/2024123100gMilk1
92/11/2024123100hMilk#N/A
10
Sheet4
Cell Formulas
RangeFormula
H2:L6H2=LET(t,A2:F9,cols,MATCH({"ID","Amount","Food","Date","Help"},A1:F1,0),ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,cd,INDEX(cols,4)-1,ce,INDEX(cols,5)-1,rt,ROWS(t),s,SEQUENCE(rt),oa,OFFSET(t,0,ca,,1),ob,OFFSET(t,0,cb,,1),oc,OFFSET(t,0,cc,,1),oe,OFFSET(t,0,ce,,1),dd,IFERROR(INDEX(t,s+oe,ca),""),c,COUNTIFS(OFFSET(t,0,ca,s,1),oa,OFFSET(t,0,cb,s,1),ob,OFFSET(t,0,cc,s,1),oc),f,FILTER(t,MOD(c,2)),fd,FILTER(dd,MOD(c,2)),sr,SEQUENCE(ROWS(f)),CHOOSE({1,2,3,4,5},INDEX(f,sr,ca+1),INDEX(f,sr,cb+1),INDEX(f,sr,cc+1),INDEX(f,sr,cd+1),INDEX(fd,sr)))
F2:F9F2=MATCH(B2&"|"&C2&"|"&E2,B3:B100&"|"&C3:C100&"|"&E3:E100,0)
Dynamic array formulas.


You can add the headings like you did before.
 
Upvote 0
Adding the order date is easy, just add "Order Date" to your first array, and "Date" to the second array. The delivery date is proving to be a tough nut to crack. I'll keep playing around with it and see what I come up with. I have Excel 2021, which does not have BYROW or VSTACK or a few other functions, which might be necessary to get it to work.

Also, do you want the results to be formatted? The results are formatted according to the existing formatting of the column/range that the data is in. If you want something like a date to be formatted within the formula, it can be done with TEXT, but the results will be text and can't be used in other formulas easily.
I see. Yeah those other functions may be necessary.

Yeah I would like it formatted to the existing formatting of the raw data, if possible. Otherwise at least the Amount column should be formatted
 
Upvote 0
To format the amount:

=LET(t,A2:F9,cols,MATCH({"ID","Amount","Food","Date","Help"},A1:F1,0),ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,cd,INDEX(cols,4)-1,ce,INDEX(cols,5)-1,rt,ROWS(t),s,SEQUENCE(rt),oa,OFFSET(t,0,ca,,1),ob,OFFSET(t,0,cb,,1),oc,OFFSET(t,0,cc,,1),oe,OFFSET(t,0,ce,,1),dd,IFERROR(INDEX(t,s+oe,ca),""),c,COUNTIFS(OFFSET(t,0,ca,s,1),oa,OFFSET(t,0,cb,s,1),ob,OFFSET(t,0,cc,s,1),oc),f,FILTER(t,MOD(c,2)),fd,FILTER(dd,MOD(c,2)),sr,SEQUENCE(ROWS(f)),CHOOSE({1,2,3,4,5},INDEX(f,sr,ca+1),TEXT(INDEX(f,sr,cb+1),"$#,##0_);($#,##0)"),INDEX(f,sr,cc+1),INDEX(f,sr,cd+1),INDEX(fd,sr)))
 
Upvote 0
To format the amount:

=LET(t,A2:F9,cols,MATCH({"ID","Amount","Food","Date","Help"},A1:F1,0),ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,cd,INDEX(cols,4)-1,ce,INDEX(cols,5)-1,rt,ROWS(t),s,SEQUENCE(rt),oa,OFFSET(t,0,ca,,1),ob,OFFSET(t,0,cb,,1),oc,OFFSET(t,0,cc,,1),oe,OFFSET(t,0,ce,,1),dd,IFERROR(INDEX(t,s+oe,ca),""),c,COUNTIFS(OFFSET(t,0,ca,s,1),oa,OFFSET(t,0,cb,s,1),ob,OFFSET(t,0,cc,s,1),oc),f,FILTER(t,MOD(c,2)),fd,FILTER(dd,MOD(c,2)),sr,SEQUENCE(ROWS(f)),CHOOSE({1,2,3,4,5},INDEX(f,sr,ca+1),TEXT(INDEX(f,sr,cb+1),"$#,##0_);($#,##0)"),INDEX(f,sr,cc+1),INDEX(f,sr,cd+1),INDEX(fd,sr)))
That formula works but it's labor intensive to update if there is a new column added. What would the formula look like if there was 6 columns (if you could bold the parts that would change that would be helpful), just want to confirm my understanding? Thanks.
 
Upvote 0
I started to think that as long as I added a helper column to get the delivery date, maybe I could make it do double duty to figure out which rows to keep. And when I did, things got much simpler. Consider:

Book1
ABCDEFGHIJKL
1DateID Amount SomethingFoodDeliver DateID Amount FoodOrder DateDelivery Date
22/5/2024123$ 100.00aMilk2/7/2024123$ 100.00Milk2/5/20242/7/2024
32/7/2024123$ 100.00bMilkDelivered123$ 100.00Eggs2/5/2024
42/5/2024123$ 100.00cEggs 888$ 50.00Cookies2/5/20242/6/2024
52/5/2024888$ 50.00dCookies2/6/2024777$ 100.00Apple2/12/2024
62/6/2024888$ 50.00eCookiesDelivered123$ 100.00Milk2/10/20242/11/2024
72/12/2024777$ 100.00fApple 
82/10/2024123$ 100.00gMilk2/11/2024
92/11/2024123$ 100.00hMilkDelivered
10ID Amount FoodOrder DateDelivery Date
11123$ 100.00Milk2/5/20242/7/2024
12123$ 100.00Eggs2/5/2024
13888$ 50.00Cookies2/5/20242/6/2024
14777$ 100.00Apple2/12/2024
15123$ 100.00Milk2/10/20242/11/2024
16
Sheet9
Cell Formulas
RangeFormula
H2:L6H2=LET(t,A2:F20,cols,XMATCH({"ID","Amount","Food","Date","Deliver Date"},A1:F1,0),f,FILTER(t,(INDEX(t,0,1)<>"")*(INDEX(t,0,INDEX(cols,5))<>"Delivered")),INDEX(f,SEQUENCE(ROWS(f)),cols))
F2:F9F2=IF(MOD(COUNTIFS(B$2:B2,B2,C$2:C2,C2,E$2:E2,E2),2),XLOOKUP(B2&"|"&C2&"|"&E2,B3:B100&"|"&C3:C100&"|"&E3:E100,A3:A100,"",0),"Delivered")
H11:L15H11=LET(t,A2:F20,cols,XMATCH({"ID","Amount","Food","Date","Deliver Date"},A1:F1,0),f,FILTER(t,(INDEX(t,0,1)<>"")*(INDEX(t,0,INDEX(cols,5))<>"Delivered")),s,SEQUENCE(ROWS(f)),TEXT(INDEX(f,s,cols),{"@","$ 0.00","@","m/d/yyyy","m/d/yyyy"}))
Dynamic array formulas.


I renamed the helper column to Delivery Date. It will either be empty if the order has not been fulfilled, or a date if it has, or "Delivered" if it's a delivery row. There is an assumption that the delivery will follow the order within 100 rows, but that can be adjusted.

This makes the result formulas MUCH easier! And no OFFSET functions. The H2 formula does not format any of the output columns. The only things you would need to change are the table address (A2:F20), the header address (A1:F1), and if you want additional columns in the output, just add the header to the array constant.

The H11 formula will format the columns before displaying them. Splitting out each individual column would make the formula much longer, so I formatted all the columns. In order to change this formula, you need to change the 3 things from the previous formula, and if you add a column, add the format for it to the array constant at the end.

Let me know what you think.
 
Upvote 0
Solution

Forum statistics

Threads
1,223,234
Messages
6,170,891
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top