Extract Unique Records Based on Date Instances

legalhustler

Well-known Member
Joined
Jun 5, 2014
Messages
1,214
Office Version
  1. 365
Platform
  1. Windows
Hi,

I'm looking for a dynamic spilled array formula (and/or even a Power Query solution) to extract unique ID records based on the Date column.

The first two records for ID 123 would be considered one unique record because they ordered milk on 2/5/24 and received it on 2/7/24 (the subsequent date is always considered the received date), a complete order. The last two records for ID 123 would be considered another unique record because ID 123 ordered milk again on 2/10/24 and received it on 2/11/24. Eggs and Apple for ID 123 and 777, respectively, were ordered but not received so they would also be considered unique records (just not a complete order).

DateIDAmountFood
2/5/24123$100Milk
2/7/24123$100Milk
2/5/24123$100Eggs
2/5/24888$50Cookies
2/6/24888$50Cookies
2/12/24777$100Apple
2/10/24123$100Milk
2/11/24123$100Milk

Here is what the result should look like:

IDAmountFood
123$100Milk
123$100Eggs
888$50Cookies
123$100Milk
777$75Apple
 
Hagia_Sofia's formula still has one semicolon in it that needs to be switched to a comma:
Excel Formula:
=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,A1:D9,join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")

But I don't think this will give you what you want, since it's not filtering out the received dates. Also, I don't think you can keep the date formatting with this formula, but I could be wrong.
Used your revised formula but changed A1:D9 to B1:D9 and it worked! (see below). The former (A1:D9) returns the whole table as is. I just wished it was written in a more simpler way to understand so I could see how it works.

My other question is what if B1:D9 were not in contiguous columns? Say there was another column before and after the Amount column which I didn't need, how would I adjust the formula to account for that?

=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,B1:D9,join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")
 
Upvote 0

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
It is true both in relation to the semicolon (try: TEXTSPLIT(TEXTJOIN("|",,LET(array,B2:D9,join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")) and in the latter part as it does not return dates - but was it not the original intention?
Can it be re-written in a simpler way and also how would I account if B2:D9 were not in contiguous columns? Say there was another column before and after the Amount column which I didn't need.
 
Upvote 0
Ah, I see what hagia_sofia is doing now. I was assuming that the matching order and delivery rows might be separated, in which case I don't think the "LAMBDA(a,b, IF(a=b, "", b))" would work, since I believe it's comparing each row to another row right next to it. So long as there's no other records between an order and the respective delivery, it works fine.

I wish I could help in cases of non-contiguous columns, but I'm afraid adjusting that formula for such cases is beyond my abilities.
 
Upvote 0
As for the latter part, it could be resolved by inserting CHOOSECOLS function:

=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,CHOOSECOLS(B1:D9,1,3,5),join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")

If we suppose that there are 5 columns, i.e. ID - column_x - Amount - column_y - Food, then 1,3,5 will represent ID, Amount, and Food.

As for the former part, it probably could but by somebody else :) What it does is: a) creates an array of interest, b) joins text within individual rows, c) compares two adjacent rows, d) replaces duplicate row with empty row and preserves the rest, e) filters out empty rows, f) splits the result into columns. Like I said above, it is not elegant and probably not the best solution...
 
Upvote 0
As for the latter part, it could be resolved by inserting CHOOSECOLS function:

=TEXTSPLIT(TEXTJOIN("|",TRUE,LET(array,CHOOSECOLS(B1:D9,1,3,5),join,SCAN("",BYROW(array,LAMBDA(a,TEXTJOIN(";",1,a))),LAMBDA(a,b,IF(a=b,"",b))),FILTER(join,join<>""))),";","|")

If we suppose that there are 5 columns, i.e. ID - column_x - Amount - column_y - Food, then 1,3,5 will represent ID, Amount, and Food.

As for the former part, it probably could but by somebody else :) What it does is: a) creates an array of interest, b) joins text within individual rows, c) compares two adjacent rows, d) replaces duplicate row with empty row and preserves the rest, e) filters out empty rows, f) splits the result into columns. Like I said above, it is not elegant and probably not the best solution...
Ah! I forgot about that function. I think you meant, CHOOSECOLS(B1:F9,1,3,5) to account for all the columns. Thanks for the explanation, hopefully someone can write it in a simpler way or in Power Query (interested in seeing how that works too). For now, I will use what you gave me.
 
Upvote 0
A bit late to the party, but maybe:

Book1
ABCDEFGHI
1DateIDAmountSomethingFoodIDAmountFood
22/5/2024123100aMilk123100Milk
32/7/2024123100bMilk123100Eggs
42/5/2024123100cEggs88850Cookies
52/5/202488850dCookies777100Apple
62/6/202488850eCookies123100Milk
72/12/2024777100fApple
82/10/2024123100gMilk
92/11/2024123100hMilk
10
Sheet4
Cell Formulas
RangeFormula
G2:I6G2=LET(t,A2:E9,cols,{2,3,5},ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,s,SEQUENCE(ROWS(t)),c,COUNTIFS(OFFSET(t,0,ca,s,1),OFFSET(t,0,ca,,1),OFFSET(t,0,cb,s,1),OFFSET(t,0,cb,,1),OFFSET(t,0,cc,s,1),OFFSET(t,0,cc,,1)),f,FILTER(t,MOD(c,2)),INDEX(f,SEQUENCE(ROWS(f)),cols))
Dynamic array formulas.


Set the range and the columns you want at the start. This formula does not actually look at the dates at all, it just checks the values in the ID, amount, and food columns and pairs up those that match.
 
Upvote 0
A bit late to the party, but maybe:

Book1
ABCDEFGHI
1DateIDAmountSomethingFoodIDAmountFood
22/5/2024123100aMilk123100Milk
32/7/2024123100bMilk123100Eggs
42/5/2024123100cEggs88850Cookies
52/5/202488850dCookies777100Apple
62/6/202488850eCookies123100Milk
72/12/2024777100fApple
82/10/2024123100gMilk
92/11/2024123100hMilk
10
Sheet4
Cell Formulas
RangeFormula
G2:I6G2=LET(t,A2:E9,cols,{2,3,5},ca,INDEX(cols,1)-1,cb,INDEX(cols,2)-1,cc,INDEX(cols,3)-1,s,SEQUENCE(ROWS(t)),c,COUNTIFS(OFFSET(t,0,ca,s,1),OFFSET(t,0,ca,,1),OFFSET(t,0,cb,s,1),OFFSET(t,0,cb,,1),OFFSET(t,0,cc,s,1),OFFSET(t,0,cc,,1)),f,FILTER(t,MOD(c,2)),INDEX(f,SEQUENCE(ROWS(f)),cols))
Dynamic array formulas.


Set the range and the columns you want at the start. This formula does not actually look at the dates at all, it just checks the values in the ID, amount, and food columns and pairs up those that match.
I will check it out but just looking at it off the bat, you’re using OFFSET which is a volatile function and will slow my performance on big data, additionally you hardcoded columns 2,3,5 in the array syntax, I would like it as robust as possible.

For the prior suggestion I replaced the hard coded columns with XMATCH to get the position of the column in the function argument.
 
Upvote 0
All valid points. If you do try it, let me know how it performs. I've heard varying versions of just how bad OFFSET is, and it would be interesting to see how much difference there is. It's easy enough to use XMATCH to find the columns, but the OFFSET is integral to the functioning of the formula. I could actually rewrite the OFFSETs with INDEX:INDEX, but that is also supposed to be volatile.
 
Upvote 0
Yes, my apologies (Friday is Friday). Just test it on a large portion of data and see it it works as intended...
Is there a way we can spill the date ordered and date received as well? As mentioned previously, the date ordered is always first followed by the date received, and if an ID has ordered but not received the item (which means there was no paired date) then obviously the date received will be empty. Thank you so much!
 
Upvote 0

Forum statistics

Threads
1,223,910
Messages
6,175,318
Members
452,634
Latest member
cpostell

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top