Help for Data Analyses

Newbie73

Board Regular
Joined
Feb 4, 2024
Messages
109
Office Version
  1. 365
Platform
  1. Windows
Hello,

I'm a bit of a newbie when it comes to Excel. In this last month I've tried to work on some data for a study that we're doing at my work.

The data is related to tests that certain individuals are required to do each x amount of times. First colum is the individual ID and Third colum the date of the tests.

14/11/2023 is the latest test, which the individual 3623, 3625 and 3626 have, meaning that all their data is still relevant.

3624 however doesn't have the last test, so I would like to flag it automatically with a formula and ideally in a different colum, so that then I could sort all the individuals without the lastest test to the bottom of the spreadsheet keep the "active" individuals and all their test results (including previous tests) on the top.

After days of failling to have an automatic formula to do this (even with helper colums), here I am. My Data has more colums but irrelevant to the example. It's over 10k rows with individuals having a varying number of tests, but my main thing is to sort the active ones from inactives (everything that doesn't have a test on 14/11/2023 is consider inactive). I've done this manual in another spreadsheet by having another colum and filling with a colour all the entries of all the individuals no longer active, and then sorting by no fill colour first.

Link to this example of spreadsheet:


Thank you so much in advance! Do let me know if you need any more detail or information, or if I wasn't clear enough.

3623CLEAR14/11/2023SEVERETRUE
3623CLEAR11/07/2023SEVERETRUE
3623CLEAR21/03/2023SEVERETRUE
3623CLEAR06/12/2022SEVERETRUE
3624CLEAR06/12/2022SEVEREFALSE
3625CLEAR14/11/2023SEVERETRUE
3625CLEAR11/07/2023SEVERETRUE
3625CLEAR21/03/2023SEVERETRUE
3625CLEAR06/12/2022SEVERETRUE
3626CLEAR14/11/2023SEVEREFALSE
3626CLEAR11/07/2023SEVEREFALSE
3626CLEAR21/03/2023SEVEREFALSE
3626CLEAR06/12/2022SEVEREFALSE
 
Never mind my previous question! Adding that formula on a new colum will still read the values from the array table. Sorry about the stupid question!! If only you could tell me now how did you get/generate those src and uidmd so that I could maintain the formula in the future with more data and different dates. You're a Legend!!
One more question actually, any chance that the cells with nothing in it could still be blank on the array formula instead of having a zero?
 
Upvote 0

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
I think the 0's would need to be dealt with separately, either with custom formatting or by post-processing the results array. Have a look at post #10 in the link below to see if Custom Formatting is an option. Alternatively, the entire spilled results array could be post-processed to replace 0's with "". I've also made an attempt to add a column to the results that could be used by conditional formatting. There is probably a better way, but I did it in two steps using the "cf" array to determine if each user ID matches the one above it, and then to create an array of numbers "cffin" indicating the group number. Then Conditional Formatting can be used with a formula in CF to alternate the color bands.
MrExcel_20240203.xlsx
RSTUVWXYZAAABACADAE
33623KK36233623CLEAR14/11/2023SEVERE1
43623YY36233623CLEAR6/10/2023SEVERE1
53623HH36233623CLEAR11/7/2023SEVERE1
63623U36233623CLEAR21/3/2023SEVERE1
73623CCC36233623CLEAR6/12/2022SEVERE1
83625WWW36253625CLEAR14/11/2023SEVERE2
93625OOO36253625CLEAR11/7/2023SEVERE2
103625EE36253625CLEAR21/4/2023SEVERE2
113625GG36253625CLEAR21/3/2023SEVERE2
123625GG36253625CLEAR6/12/2022SEVERE2
133626PP36263626CLEAR14/11/2023SEVERE3
143626BBB36263626CLEAR11/7/2023SEVERE3
153626JJJ36263626CLEAR21/3/2023SEVERE3
163626D36263626CLEAR6/12/2022SEVERE3
17
183624EE36243624CLEAR6/12/2022SEVERE1
193628II36283628CLEAR8/8/2023SEVERE2
20
Sheet5
Cell Formulas
RangeFormula
R3:AE19R3=LET( src, A3:M20, fsrc, FILTER(src,INDEX(src,,1)<>""), shortID, INDEX(fsrc,,1), dates, INDEX(fsrc,,12), maxdate, MAX(dates), uidmd, SORT(UNIQUE(FILTER(shortID,dates=maxdate))), uidnmd, SORT(UNIQUE(VSTACK(shortID,uidmd),,1)), upper_md, FILTER(fsrc,ISNUMBER(MATCH(shortID,uidmd,0))), lower_notmd, UNIQUE(VSTACK(fsrc,upper_md),,1), res, VSTACK( SORT(upper_md,{1,12},{1,-1}), EXPAND("",,COLUMNS(fsrc),""), VSTACK(SORT(lower_notmd,{1,12},{1,-1})) ), resid, TAKE(res,,1), cf, MAKEARRAY(ROWS(res),1,LAMBDA(r,c,IF(r=1,1,IF(INDEX(resid,r)="","",IF(INDEX(resid,r)=INDEX(resid,r-1),0,1) )))), cffin, SCAN(0,cf,LAMBDA(a,v,IF(v="","",N(a)+v ))), finfin, HSTACK(res,cffin), IF(finfin=0,"",finfin) )
Dynamic array formulas.
Cells with Conditional Formatting
CellConditionCell FormatStop If True
R1:AE24Expression=MOD($AE1,2)=1textNO

 
Upvote 0
Hello, this formula is indeed working wonders and making my life much easier! KRice, in the chance you read this new reply, is there any way that I could automatically copy all the values from B2 in your dynamic array table until the blank row that separates IDs that don't have the latest date in the formula, to for example another sheet "Lumps" starting at A2?

And the same for all the cell values still in Colum B but starting just after that blank row until the end of the data?

Maybe it's not doable, but just thought in giving it a try anyway :)
 
Upvote 0
I don't quite understand what you want to do. The formula only has a few connections to the data set that it operates on. Anywhere you see a range reference (e.g., A3:M20) or column numbers (I see some 1's and 12's) are the places that would need to be examined carefully to ensure they work on another data set. It would be helpful if you posted a small example showing the location and structure of data,

Are you saying that if we consider the existing sample data in A3:M20, you would want all of the ID's that are not associated with the "maxdate" to be listed in a "Lumps" worksheet beginning at A2? If so, that would be the "uidnmd" (User ID Not Max Date) array included in the formula. Here it is delivering a result:
MrExcel_20240203.xlsx
ABCDEFGHIJKLMNOPQR
1
2
33623KK36233623CLEAR14/11/2023SEVERETRUE3624
43623HH36233623CLEAR11/7/2023SEVERETRUE3628
53623U36233623CLEAR21/3/2023SEVERETRUE
63623CCC36233623CLEAR6/12/2022SEVERETRUE
73624EE36243624CLEAR6/12/2022SEVEREFALSE
83625WWW36253625CLEAR14/11/2023SEVERETRUE
93625OOO36253625CLEAR11/7/2023SEVERETRUE
103625GG36253625CLEAR21/3/2023SEVERETRUE
113625GG36253625CLEAR6/12/2022SEVERETRUE
123626PP36263626CLEAR14/11/2023SEVEREFALSE
133626BBB36263626CLEAR11/7/2023SEVEREFALSE
143626JJJ36263626CLEAR21/3/2023SEVEREFALSE
153626D36263626CLEAR6/12/2022SEVEREFALSE
16
173623YY36233623CLEAR6/10/2023SEVERETRUE
183628II36283628CLEAR8/8/2023SEVEREFALSE
19
203625EE36253625CLEAR21/4/2023SEVERETRUE
Sheet5
Cell Formulas
RangeFormula
R3:R4R3=LET( src, A3:M20, fsrc, FILTER(src,INDEX(src,,1)<>""), shortID, INDEX(fsrc,,1), dates, INDEX(fsrc,,12), maxdate, MAX(dates), uidmd, SORT(UNIQUE(FILTER(shortID,dates=maxdate))), uidnmd, SORT(UNIQUE(VSTACK(shortID,uidmd),,1)), upper_md, FILTER(fsrc,ISNUMBER(MATCH(shortID,uidmd,0))), lower_notmd, UNIQUE(VSTACK(fsrc,upper_md),,1), res, VSTACK( SORT(upper_md,{1,12},{1,-1}), EXPAND("",,COLUMNS(fsrc),""), VSTACK(SORT(lower_notmd,{1,12},{1,-1})) ), resid, TAKE(res,,1), cf, MAKEARRAY(ROWS(res),1,LAMBDA(r,c,IF(r=1,1,IF(INDEX(resid,r)="","",IF(INDEX(resid,r)=INDEX(resid,r-1),0,1) )))), cffin, SCAN(0,cf,LAMBDA(a,v,IF(v="","",N(a)+v ))), finfin, HSTACK(res,cffin), resres, IF(finfin=0,"",finfin), uidnmd )
A3:A15,A20,A17:A18A3=RIGHT(B3,4)
Dynamic array formulas.
Cells with Conditional Formatting
CellConditionCell FormatStop If True
R1:AE24Expression=MOD($AE1,2)=1textNO
 
Upvote 0
Sorry if I didn't explain myself correctly. The original formula itself is perfect and don't want to change anything there. Alongside your formula I have in colum Q the formula: =COUNTA(I2)-COUNTIF(I2,"="&H2) to give me some data information

Then on a different sheet, I compile all the B values that have the maxdate by copying and pasting manually (So I have to scroll down and find where the cut off is in the data between the values that have the maxdate and the oens that don't, copy paste all the values, including all the duplicates from B2 until the last value that matches the maxdate). Manually I also copy paste all the values from Colum Q matching all the values in Colum B down to the same row.

Having this two colums then I'm able to use the consolidate data button to elimitate the duplicates but still sum all the numbers from the same IDs

For example:
(Value copied from Colum B in sheet Main to sheet Lumps A2,A3 etc) (Value copied from Colum Q in sheet Main to sheet Lumps B2,B3, etc)
FF266 4
FF266 2
TT727 2
RR333 5
RR333 0
RR333 2

For then consolidating the data resulting in:

FF266 6
TT727 2
RR333 7

Just wondering if there was anyway to do this in a more automatic way rather than copying pasting from another sheet and then going to Data and doing consolidating data. But not touching your dinamic array formula, this all would have to be in a different sheet. Apologies if I still didn't make it clear enough
 
Upvote 0
Your initial follow-up question:
is there any way that I could automatically copy all the values from B2 in your dynamic array table until the blank row that separates IDs that don't have the latest date
Yes, this sounds as if you want either the "uidmd" or "upper_md" arrays. uidmd is the array of unique User IDs associated with the maxdate...no duplicates, just the IDs. upper_md is the entire upper part of the results table, so it includes all rows from the original source data that belong to any of the User IDs that have a maxdate. Which of these do you want? Or is the answer neither?

You've continued with this:
Then on a different sheet, I compile all the B values that have the maxdate by copying and pasting manually
Are you saying that you want 1) all records from the original source data (columns A:M in my mini sheet) whose User ID is associated with a maxdate, or 2) the records delivered in the upper part of the table by the earlier formula...all of those are also associated with maxdate, but they are grouped by User ID and sorted? This matters, I suspect, because of the column Q formula you've mentioned. It appears to be operating on data found in the original source data range, so results from the formula would not correspond row-by-row with the results from the earlier formula. But I believe the contents of columns H and I would be delivered in the results from the earlier formula, so the formula described in column Q formula could be included elsewhere on the Lumps worksheet.

Regarding the column Q formula, I don't understand what the COUNTA/COUNTIF formula is supposed to do:
Excel Formula:
=COUNTA(I2)-COUNTIF(I2,"="&H2)
Because COUNTA refers to a single cell, it will return either a 0 or 1, 0 if nothing is in the referenced cell I2, and 1 otherwise. And COUNTIF would be more efficiently written as COUNTIF(I2,H2)...the equals is automatically understood and doesn't need to be explicitly written. But the COUNTIF expression refers to only two cells, and will produce a result of either 0 or 1 (if the two cells are equal). So there are only four possible combinations for (H,I)...{(1,1),(0,1),(1,0),(0,0)}, and only 3 possible unique results from the subtraction {0,1,-1,0}...or 0,1,-1. Is that what you want?
 
Upvote 0
Your initial follow-up question:

Yes, this sounds as if you want either the "uidmd" or "upper_md" arrays. uidmd is the array of unique User IDs associated with the maxdate...no duplicates, just the IDs. upper_md is the entire upper part of the results table, so it includes all rows from the original source data that belong to any of the User IDs that have a maxdate. Which of these do you want? Or is the answer neither?

You've continued with this:

Are you saying that you want 1) all records from the original source data (columns A:M in my mini sheet) whose User ID is associated with a maxdate, or 2) the records delivered in the upper part of the table by the earlier formula...all of those are also associated with maxdate, but they are grouped by User ID and sorted? This matters, I suspect, because of the column Q formula you've mentioned. It appears to be operating on data found in the original source data range, so results from the formula would not correspond row-by-row with the results from the earlier formula. But I believe the contents of columns H and I would be delivered in the results from the earlier formula, so the formula described in column Q formula could be included elsewhere on the Lumps worksheet.

Regarding the column Q formula, I don't understand what the COUNTA/COUNTIF formula is supposed to do:
Excel Formula:
=COUNTA(I2)-COUNTIF(I2,"="&H2)
Because COUNTA refers to a single cell, it will return either a 0 or 1, 0 if nothing is in the referenced cell I2, and 1 otherwise. And COUNTIF would be more efficiently written as COUNTIF(I2,H2)...the equals is automatically understood and doesn't need to be explicitly written. But the COUNTIF expression refers to only two cells, and will produce a result of either 0 or 1 (if the two cells are equal). So there are only four possible combinations for (H,I)...{(1,1),(0,1),(1,0),(0,0)}, and only 3 possible unique results from the subtraction {0,1,-1,0}...or 0,1,-1. Is that what you want?
Thanks again for your reply. It would be the upper_md then! As it's what I do then manualy to copy it then to the sheet Lumps. I've also did (what I should have done straightaway) a dummy excel for you to have a look at. So on Main, it's your previously formula (in the dummy one I just copied pasted entries from the area where upper_md ends and the IDs without the max date start.


Now seprated to your dinamic array formula but in the same sheet I have:

On Colum P I have =IF(B2=B1,P1,NOT(P1)), it's not relevant to this questions just want to make a reliable copy dummy for you. It flags me when the ID changes and with conditional formatting I change the colour. Just to have a visual flag between different IDs as it can get confusing and hard to see due to the amount of data.
On Colum Q I have the =COUNTA(I2)-COUNTIF(I2,"="&H2) because I want to count as 1 if I2 is present but not the dame as H2 (so 6 6 would be zero on Q, 6 7 or any diferent it counts as 1)
On Colum R I have the =SUM(I2-H2), simply gives me the difference between I2 and H2

Now on Lumps sheet, I've done it manually for you to see. Copied from B2 to B49, which are all the upper_md you've mentioned to A2 on sheet Lumps. Then manually I've copied Colum Q2 to Q49 as well to B2. Finally I use the consolidated Data option (SUM) with the B2 to C49 Reference to give me that result, which basically deletes all the duplicates but sums the values that were from the duplicates to one single ID. This is the area that I would like to find a way to automate, as it's thousands of data and the reference is not always the same, as I use your dinamic array formula to severel different books with different amount of data.

The Main colum R formula is there just so that you know there is a second separate step, where I do exactly the same on Lumps2 sheet, same upper_md values but now with colum R values so that I consolidate the data as well, ending up with one unique ID but with all the entries from that ID number sum to a total, I've done it manually as well for you to see.

The OUTLumps and OUTLumps2 are the same think, but instead of upper_md it's with the opossite values, that ones below that blank row that don't match the maxdate

Feel free to optimise any of my formulas, I wrote them myself so I don't doubt they could be better ahah but they do the job
 
Upvote 0
Okay...so you could repeat the entire formula, and edit it to exclude parts that are not needed (such as the lower section with non-maxdate IDs and codes used for conditional formatting), or you could simply find the blank row in the results table and then "TAKE" all rows above it. I would probably opt for this second option as it doesn't require all of the earlier processing steps.

As a side note, I wondered if a situation ever arose where all User IDs were associated with the maxdate, so the lower part of the results table would be blank. This induces an error and no results will be displayed. So I've revised the earlier formula to account for this scenario. The formula is the same except for the definitions of:
  1. uidnmd (User ID Not MaxDate), where if no user IDs can be found that do not have a maxdate associated with them, the list is a single "" (empty), and for...
  2. lower_notmd (the lower part of the results table for those User ID's that do Not have a MaxDate). The previous formula has been replaced with one that will create an array of blanks (a blank row) that will later be appended to the upper part of the table. So the end result would be an upper table, a blank row to separate upper from lower, and then one more blank row representing the blank lower part of the table.
Here is the revised formula:
Excel Formula:
=LET( src, A3:M20,     fsrc, FILTER(src,INDEX(src,,1)<>""),     shortID, INDEX(fsrc,,1),     dates, INDEX(fsrc,,12),     maxdate, MAX(dates),
   uidmd, SORT(UNIQUE(FILTER(shortID,dates=maxdate))),     uidnmd, IFERROR(SORT(UNIQUE(VSTACK(shortID,uidmd),,1)),""),
   upper_md,  FILTER(fsrc,ISNUMBER(MATCH(shortID,uidmd,0))),    lower_notmd, IFERROR(UNIQUE(VSTACK(fsrc,upper_md),,1),EXPAND("",,COLUMNS(fsrc),"")),
   res,     VSTACK(  SORT(upper_md,{1,12},{1,-1}),   EXPAND("",,COLUMNS(fsrc),""),   VSTACK(SORT(lower_notmd,{1,12},{1,-1}))  ),
   resid, TAKE(res,,1),
   cf,     MAKEARRAY(ROWS(res),1,LAMBDA(r,c,IF(r=1,1,IF(INDEX(resid,r)="","",IF(INDEX(resid,r)=INDEX(resid,r-1),0,1) )))),
   cffin,  SCAN(0,cf,LAMBDA(a,v,IF(v="","",N(a)+v   ))),
   finfin, HSTACK(res,cffin),   resres, IF(finfin=0,"",finfin), resres )
You'll need to revise the A3:M20 range to cover the correct range on your worksheet (I believe you've begun on row 2, not 3 as in my example). Then the results in my example are spilled from cell R3 on the worksheet named Sheet5, so to extract all rows from the results above the first blank cell in the first column, this formula can be used:
Excel Formula:
=TAKE(Sheet5!R3#,MATCH(TRUE,TAKE(Sheet5!R3#,,1)="",0)-1)
Again, revise the two references to the correct worksheet name and cell address from which the results spill.
 
Upvote 0
Solution

Forum statistics

Threads
1,224,818
Messages
6,181,151
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top