Using FILTER/UNIQUE on text, numbers

danhendo888

Board Regular
Joined
Jul 15, 2019
Messages
167
Office Version
  1. 365
Platform
  1. Windows
I download the below spreadsheet daily from our software. It shows details of invoices paid/booked.
*It doesn't show it below but the first three columns have the little green ribbon on the top left corner of the cell and is aligned to the left of the cell and has General formatting.

Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
1111111111ABC123 y81900901203/12/202410.00100.00AUD
1111111111ABC123 y81900901203/12/202450.00500.00AUD
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
1111111111ABC123 y81900901203/12/20240.000.00AUD3/14/2024
222222222211122200081900900001/1/20240.0010,000.00AUD


Below is my own spreadsheet. I type in the invoice number and retrieve the Order Number, Order Total, and Payment Date from the downloaded file.
The first column has the little green ribbon at the top left corner also.
The last three columns is the output I wish to achieve
Company NumberInvoice NumberOrder NumberOrder TotalPayment Date
1111111111ABC12381900901206603/14/2024
2222222222111222000819009000010,000.00


Order Number:
To retrieve the Order Number, I use FILTER and UNIQUE. It retrieves it successfully when the invoice number in my spreadsheet has text formatting e.g. ABC123
But it doesn't work when the invoice number in my spreadsheet has number formatting because the invoice numbers in the downloaded file does not have number formatting. How can I make the formula work for both text and numbers.

Also, how can I retrieve the correct order number when the correct invoice number is "ABC123" but it was incorrectly booked as "ABC123 y"

Order Total:
I use two SUMIFS to add the Order Tax and Order Subtotal columns together.
Is there a more efficient way to add these? Or is this okay to use.

Payment Date:
How can I retrieve the payment date for invoice ABC123? The date does not necessarily show in all rows in the extracted data
 
Here is one idea:
Initial data on Daily worksheet...
MrExcel_20240315.xlsx
ABCDEFGH
1Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
21111111111ABC123 y81900901203/12/202410100AUD
31111111111ABC123 y81900901203/12/202450500AUD3/13/2024
41111111111ABC123 81900901443/12/2024510AUD3/14/2024
51111111111ABC123 y81900902223/12/2024620AUD
6ABC123 81900902223/12/20247100AUD3/15/2024
72222222222111222000z81900900001/1/2024333AUD
82222222222111222000 x81900922221/1/2024444AUD3/16/2024
9222222222211144000z81900901111/1/2024555AUD3/17/2024
1022222222221114400081900908881/1/2024666AUD
112222222222111552000 g81900900001/1/2024777AUD3/8/2024
Daily

Then spilling horizontal arrays with results:
MrExcel_20240315.xlsx
ABCDEFGHIJK
1Company NumberInvoice NumberOrder NumberOrder TotalPayment DateOrder NumberOrder TotalPayment DateOrder NumberOrder TotalPayment Date
21111111111ABC1238190090120660.003/13/2024819009014415.003/14/20248190090222133.003/15/2024
3111222000819009000036.00none819009222248.003/16/2024
4111552000819009000084.003/8/2024
511144000819009011160.003/17/2024819009088872.00none
6not found
MyOwn2
Cell Formulas
RangeFormula
C2:K2,C6,C5:H5,C4:E4,C3:H3C2=LET(d,Daily!$A$2:$H$11,f,IF(B2="",#N/A,FILTER(d,ISNUMBER(SEARCH(B2,INDEX(d,,2))))), uon,UNIQUE(INDEX(f,,3)), son,BYROW(uon,LAMBDA(r,SUM(CHOOSECOLS(f,5,6)*(CHOOSECOLS(f,3)=r)))), pdon,BYROW(uon,LAMBDA(r,IFERROR(1/(1/MAX(CHOOSECOLS(f,8)*(CHOOSECOLS(f,3)=r))),"none"))), res,IFERROR(HSTACK(uon,son,pdon),"not found"),TOROW(res))
Dynamic array formulas.
 
Upvote 1

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Works beautifully, thank you.

Can it also be spilled horizontally like the below? (Ignoring all other issues)

Company NumberInvoice NumberOrder NumberOrder TotalPayment Date
1111111111ABC12381900901201103/13/2024
8190090144153/14/2024
 
Upvote 0
Yes, but that becomes more complicated because we can't actually shift your input cells (Company Number and Invoice Number or your worksheet). But we can construct a new table that uses those two input columns, replicating the company number and invoice number for each Order Number subgroup associated with them. This involves spilling results across and down the page...and that's where the challenge resides.

In the above solution, we have each inputted Company Number and Invoice Number first being used to filter the Daily data to extract a subset for the inputted Invoice Number. And because the invoice number might have several order numbers associated with it, we first create a vertical array of unique order numbers (uon) for that invoice number. Then we feed the "uon" array, row-by-row, into BYROW LAMBDA functions to generate two single column spilling arrays for the sum of amounts associated with each order number (son), and for the payment dates for each order number (pdon). Those three vertical arrays are stacked together horizontally to create a 3-column array consisting of some number of rows (the number of orders numbers associated with the subject invoice number). Then the 3-column array is converted to a single horizontal spilling array using the TOROW function. This gets us the needed content in a relatively straightforward manner.

We can use this general idea down to the final steps, but instead of converting the two-dimensional array into a single horizontal spilling array, I first include the inputted company number and invoice number in each set of data and append a row delimiter "//" after the payment date. Then all results for that invoice number are joined together, each separated by a column delimiter "|". This creates a single cell output consisting of all data for the invoice number, and it looks similar to this:
1111111111|ABC123|8190090120|660|45364|//|1111111111|ABC123|8190090144|15|45365|//|1111111111|ABC123|8190090222|133|45366|//
But we want a similar result for each row (invoice number) in your input table. So everything mentioned above in this paragraph is wrapped inside a BYROW LAMBDA function, where your input table (columns A:B...company number and invoice number) are fed, row by row, into the functions that summarize the data for each of the order numbers associated with the invoice number. The end result is a single column spilling array, each row resembling the example above. I've called this single column "intres"...for intermediate results.
1111111111|ABC123|8190090120|660|45364|//|1111111111|ABC123|8190090144|15|45365|//|1111111111|ABC123|8190090222|133|45366|//
0|111222000|8190090000|36|none|//|0|111222000|8190092222|48|45367|//
3333333333|111552000|8190090000|84|45359|//
0|11144000|8190090111|60|45368|//|0|11144000|8190090888|72|none|//

This format is well-suited for spilling results down the page. For these types of problems, there are some options.

Option 1:
At times, rather than reinvent the wheel, I've relied on previously developed formulas. One that is tailor made for this problem is a LAMBDA function called ROW BYROW, or RBYROW described here: AUNIQUE
To use it, you would use the Excel Name Manager (or Advanced Formula Environment) to create the name RBYROW, and indicate that it is designed to accept two arguments (the array and some function). Then it can be used to transform the intermediate results..."intres". After the results are spilled, some cleanup and formatting is necessary to convert text to numbers where appropriate and replace 0's and errors with blanks and meaningful messages.

Option 2:
We can also condition the intermediate results, adding or deleting some of the row/column delimiters to ensure the correct structure in the next step. The entire single column array of "intres" is then combined into a single cell and then split apart using the row/column delimiters. Then the same cleanup and formatting is performed. The advantage is that the separate LAMBDA function does not need to be created. A potential disadvantage is that the CONCAT function might cause a problem should the cell size limit exceed Excel's limit of 32,767 characters. I doubt there will be so many order numbers to cause this problem, but you would know better.

One final note: I'm assuming that you may not necessarily know where the bottom of the Daily data table is...unless you look and then adjust the range in the formula. For convenience, a dynamic array is formed. We assume that we can look down the 2nd column of the Daily worksheet and the lowest non-blank cell represents the bottom of the Daily data table. That row number is assigned to the variable "lrowd"...last row daily. And a dynamic range is created of the Daily data, called "d", which assumes the data begin on row 2 of the Daily sheet...and columns A:H are where the data reside. Any of these can be adjusted to reflect reality. Similarly, I've assumed your Company Number and Invoice Number inputs are in columns A:B, and these are assigned to variable "mine"...because they are yours;). Then we can look for the lowest non-blank cell in column B, assign that row number to "lrowm"...last row mine, and create a dynamically formed array of your inputs, called "m". These assignments are done first thing in the formula, and should be the only part that needs to be adapted to your workbook. Everything below this in the formula is drawn from these arrays.

For this Daily data set:
MrExcel_20240315.xlsx
ABCDEFGH
1Company NumberInvoice NumberOrder NumberOrder DateOrder TaxOrder SubtotalCurrencyPayment Date
21111111111ABC123 y81900901203/12/202410100AUD
31111111111ABC123 y81900901203/12/202450500AUD3/13/2024
41111111111ABC123 81900901443/12/2024510AUD3/14/2024
51111111111ABC123 y81900902223/12/2024620AUD
6ABC123 81900902223/12/20247100AUD3/15/2024
72222222222111222000z81900900001/1/2024333AUD
82222222222111222000 x81900922221/1/2024444AUD3/16/2024
9222222222211144000z81900901111/1/2024555AUD3/17/2024
1022222222221114400081900908881/1/2024666AUD
113333333333111552000 g81900900001/1/2024777AUD3/8/2024
12
Daily

Here are Options 1 (cell S2) and 2 (cell Y2), with your initial query inputs in columns A:B. I think Option 2 might be easier to implement. I've left Company Number, Invoice Number, and Order Number as text. If any/all should be true numbers, the final output can be adjusted in the formula (I wasn't sure if any of these might have leading 0's that would be dropped in a number format).
MrExcel_20240315.xlsx
ABRSTUVWXYZAAABAC
1Company NumberInvoice NumberCompany NumberInvoice NumberOrder NumberOrder TotalPayment DateCompany NumberInvoice NumberOrder NumberOrder TotalPayment Date
21111111111ABC1231111111111ABC1238190090120660.003/13/20241111111111ABC1238190090120660.003/13/2024
31112220001111111111ABC123819009014415.003/14/20241111111111ABC123819009014415.003/14/2024
433333333331115520001111111111ABC1238190090222133.003/15/20241111111111ABC1238190090222133.003/15/2024
511144000111222000819009000036.00none111222000819009000036.00none
6111222000819009222248.003/16/2024111222000819009222248.003/16/2024
73333333333111552000819009000084.003/8/20243333333333111552000819009000084.003/8/2024
811144000819009011160.003/17/202411144000819009011160.003/17/2024
911144000819009088872.00none11144000819009088872.00none
10
MySummary
Cell Formulas
RangeFormula
S2:W9S2=LET(daily, Daily!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), d, Daily!A2:INDEX(Daily!A:H, lrowd, ), mine, A:B, lrowm, LOOKUP(2, 1 / (INDEX(mine, , 2) <> ""), ROW(INDEX(mine, , 2))), m, A2:INDEX(A:B, lrowm, ), intres, BYROW(m, LAMBDA(r, LET(cn, INDEX(r, 1), in, INDEX(r, 2), f, IF(in = "", #N/A, FILTER(d, ISNUMBER(SEARCH(in, INDEX(d, , 2))))), uon, SORT(UNIQUE(INDEX(f, , 3))), son, BYROW(uon, LAMBDA(r, SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)))), pdon, BYROW(uon, LAMBDA(r, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = r))), "none"))), res, IFERROR(HSTACK(r, uon, son, pdon, "//"), HSTACK(r, "not found", "", "", "//")), TEXTJOIN("|", , res) ) )),sr,RBYROW(intres, LAMBDA(r, TEXTSPLIT(r, "|", "//", 1, , ""))),HSTACK(IF(CHOOSECOLS(sr,1)+0=0,"",CHOOSECOLS(sr,1)), CHOOSECOLS(sr,2,3),CHOOSECOLS(sr,4)+0,IFERROR(CHOOSECOLS(sr,5)+0,"none")))
Y2:AC9Y2=LET(daily, Daily!A:H, lrowd, LOOKUP(2, 1 / (INDEX(daily, , 2) <> ""), ROW(INDEX(daily, , 2))), d, Daily!A2:INDEX(Daily!A:H, lrowd, ), mine, A:B, lrowm, LOOKUP(2, 1 / (INDEX(mine, , 2) <> ""), ROW(INDEX(mine, , 2))), m, A2:INDEX(A:B, lrowm, ), intres, BYROW(m, LAMBDA(r, LET(cn, INDEX(r, 1), in, INDEX(r, 2), f, IF(in = "", #N/A, FILTER(d, ISNUMBER(SEARCH(in, INDEX(d, , 2))))), uon, SORT(UNIQUE(INDEX(f, , 3))), son, BYROW(uon, LAMBDA(r, SUM(CHOOSECOLS(f, 5, 6) * (CHOOSECOLS(f, 3) = r)))), pdon, BYROW(uon, LAMBDA(r, IFERROR(1 / (1 / MAX(CHOOSECOLS(f, 8) * (CHOOSECOLS(f, 3) = r))), "none"))), res, IFERROR(HSTACK(r, uon, son, pdon, "//"), HSTACK(r, "not found", "", "", "//")), TEXTJOIN("|", , res) ) )),sr,TEXTSPLIT(CONCAT(SUBSTITUTE(CONCAT("|",intres),"|//","//")),"|","//",1,,""),HSTACK(IF(CHOOSECOLS(sr,1)+0=0,"",CHOOSECOLS(sr,1)), CHOOSECOLS(sr,2,3),CHOOSECOLS(sr,4)+0,IFERROR(CHOOSECOLS(sr,5)+0,"none")))
Dynamic array formulas.

I've probably made this harder than it needs to be, but either of these should work.
 
Upvote 1
Much appreciated Kirk!
I am learning so much from your posts, thank you

And sincere apologies for not including all the necessary information in the OP :cry:
 
Upvote 0
You're welcome...I'm happy to help. I don't quite have the same error trapping in the vertical spilling versions, but I'm not sure if that is a problem either. Have a look and let me know if you encounter any problems or have any questions about the formula. One thing I noted is that a blank invoice number on the input side produces an error, but the cause is fairly obvious as there is nothing to lookup.
 
Upvote 0
intres, BYROW(m, LAMBDA(r, LET(cn, INDEX(r, 1), in, INDEX(r, 2)

Regarding Option 2 in cell Y2, specifically the part above.
It looks like it's defining cn and in within m but m is the below (correct?):

1111111111ABC123
111222000
3333333333111552000
11144000


Why is the INDEX (r,1) and (r,2) targeting the rows? And not the columns?
 
Upvote 0
Also, is it possible to search not just for the invoice number but by company number AND the invoice number (for instances where the same invoice number is issued by two different companies?
I tried adding INDEX(d,,1)=cn* to f but it's not quite right
 
Upvote 0
Why is the INDEX (r,1) and (r,2) targeting the rows? And not the columns?
Good question. After trimming down the full column references A:B to include only your data in a variable called "m", we feed the "m" data into a BYROW LAMBA function. This takes "m" data one row at a time...intres, BYROW(m, LAMBDA(r,...
So within the actual LAMBDA function, this row of the "m" data is referred to as "r"...short for "row". In this case, since we have a single row of data, we can refer to the 1st element (in the 1st column index position) as INDEX(r,1)...which incidentally could also be INDEX(r,,1). Similarly INDEX(r,2) gets us the 2nd element in the array, but we could also use INDEX(r,,2). Intuitively, it might make more sense to use the column nomenclature...and I think that would be a good revision. Interestingly, Excel interprets these the same when we have a single horizontal array/range, but not when we have a single vertical array/range.
MrExcel_20240315.xlsx
ALAMANAOAP
12abcaa
13bb
14cc
15aa#REF!#REF!a
16bbb
17ccc
MySummary
Cell Formulas
RangeFormula
AO12AO12=INDEX($AL$12:$AN$12,1)
AO13AO13=INDEX($AL$12:$AN$12,2)
AO14AO14=INDEX($AL$12:$AN$12,3)
AM15:AM17AM15=INDEX($AL$15:$AL$17,,1)
AN15AN15=INDEX($AL$15:$AL$17,,2)
AO15AO15=INDEX($AL$15:$AL$17,,3)
AP12AP12=INDEX($AL$12:$AN$12,,1)
AP13AP13=INDEX($AL$12:$AN$12,,2)
AP14AP14=INDEX($AL$12:$AN$12,,3)
AP15AP15=INDEX($AL$15:$AL$17,1)
AP16AP16=INDEX($AL$15:$AL$17,2)
AP17AP17=INDEX($AL$15:$AL$17,3)
Dynamic array formulas.
 
Upvote 1
is it possible to search not just for the invoice number and but company number AND the invoice number (for instances where the same invoice number is issued by two different companies?
Yes. I was curious about this...whether an invoice was always associated with just one company. I guess not. I'm concerned about some error trapping. Is your query data in A:B always complete?...that is, you always have a company number and an invoice number on each row? Or do you sometimes have blanks? I think that's why the (INDEX(d, , 1) = cn) * inside FILTER isn't working quite right.

And on a related note...this appears in the Daily worksheet:
MrExcel_20240315.xlsx
AB
1Company NumberInvoice Number
21111111111ABC123 y
31111111111ABC123 y
41111111111ABC123
51111111111ABC123 y
6ABC123
72222222222111222000z
Daily

Are there sometimes blanks under the Company Number column? If so, those wouldn't be picked up unless we ignore the "cn" part of the filtering array for blanks and just go by invoice number.
 
Upvote 0
is it possible to search not just for the invoice number but by company number AND the invoice number (for instances where the same invoice number is issued by two different companies?
Some clarification needed here. The current formula takes an inputted invoice number and includes variations of that invoice number, provided the inputted invoice number text is found within the invoice number text on the Daily data table. And this subset of Daily data is then grouped by order number and reported back using whatever company number and invoice number you've specified in "m"...in columns A:B in your input section. So are you saying that you would like to look for an exact match with inputted company number and a somewhat fuzzy match with inputted invoice number and do the same thing?...group that subset of Daily data by order number for final outputs? This was my interpretation, but wanted to clarify.
 
Upvote 0

Forum statistics

Threads
1,225,743
Messages
6,186,773
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top