Help for Data Analyses

Newbie73

Board Regular
Joined
Feb 4, 2024
Messages
109
Office Version
  1. 365
Platform
  1. Windows
Hello,

I'm a bit of a newbie when it comes to Excel. In this last month I've tried to work on some data for a study that we're doing at my work.

The data is related to tests that certain individuals are required to do each x amount of times. First colum is the individual ID and Third colum the date of the tests.

14/11/2023 is the latest test, which the individual 3623, 3625 and 3626 have, meaning that all their data is still relevant.

3624 however doesn't have the last test, so I would like to flag it automatically with a formula and ideally in a different colum, so that then I could sort all the individuals without the lastest test to the bottom of the spreadsheet keep the "active" individuals and all their test results (including previous tests) on the top.

After days of failling to have an automatic formula to do this (even with helper colums), here I am. My Data has more colums but irrelevant to the example. It's over 10k rows with individuals having a varying number of tests, but my main thing is to sort the active ones from inactives (everything that doesn't have a test on 14/11/2023 is consider inactive). I've done this manual in another spreadsheet by having another colum and filling with a colour all the entries of all the individuals no longer active, and then sorting by no fill colour first.

Link to this example of spreadsheet:


Thank you so much in advance! Do let me know if you need any more detail or information, or if I wasn't clear enough.

3623CLEAR14/11/2023SEVERETRUE
3623CLEAR11/07/2023SEVERETRUE
3623CLEAR21/03/2023SEVERETRUE
3623CLEAR06/12/2022SEVERETRUE
3624CLEAR06/12/2022SEVEREFALSE
3625CLEAR14/11/2023SEVERETRUE
3625CLEAR11/07/2023SEVERETRUE
3625CLEAR21/03/2023SEVERETRUE
3625CLEAR06/12/2022SEVERETRUE
3626CLEAR14/11/2023SEVEREFALSE
3626CLEAR11/07/2023SEVEREFALSE
3626CLEAR21/03/2023SEVEREFALSE
3626CLEAR06/12/2022SEVEREFALSE
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.
Welcome to the MrExcel board! Have a look at this and see if it does what you want:
MrExcel_20240203.xlsx
CDEFGHIJKLMNO
33623CLEAR14/11/2023SEVERETRUE3623CLEAR14/11/2023SEVERE0TRUE
43623CLEAR11/7/2023SEVERETRUE3623CLEAR6/10/2023SEVERE0TRUE
53623CLEAR21/3/2023SEVERETRUE3623CLEAR11/7/2023SEVERE0TRUE
63623CLEAR6/12/2022SEVERETRUE3623CLEAR21/3/2023SEVERE0TRUE
73624CLEAR6/12/2022SEVEREFALSE3623CLEAR6/12/2022SEVERE0TRUE
83625CLEAR14/11/2023SEVERETRUE3625CLEAR14/11/2023SEVERE0TRUE
93625CLEAR11/7/2023SEVERETRUE3625CLEAR11/7/2023SEVERE0TRUE
103625CLEAR21/3/2023SEVERETRUE3625CLEAR21/4/2023SEVERE0TRUE
113625CLEAR6/12/2022SEVERETRUE3625CLEAR21/3/2023SEVERE0TRUE
123626CLEAR14/11/2023SEVEREFALSE3625CLEAR6/12/2022SEVERE0TRUE
133626CLEAR11/7/2023SEVEREFALSE3626CLEAR14/11/2023SEVERE0FALSE
143626CLEAR21/3/2023SEVEREFALSE3626CLEAR11/7/2023SEVERE0FALSE
153626CLEAR6/12/2022SEVEREFALSE3626CLEAR21/3/2023SEVERE0FALSE
163626CLEAR6/12/2022SEVERE0FALSE
173623CLEAR6/10/2023SEVERETRUE
183628CLEAR8/8/2023SEVEREFALSE3624CLEAR6/12/2022SEVERE0FALSE
193628CLEAR8/8/2023SEVERE0FALSE
203625CLEAR21/4/2023SEVERETRUE
Sheet4
Cell Formulas
RangeFormula
J3:O19J3=LET(src,C3:H20,fsrc,FILTER(src,INDEX(src,,1)<>""),maxdate,MAX(INDEX(fsrc,,3)),uidmd,SORT(UNIQUE(FILTER(INDEX(fsrc,,1),INDEX(fsrc,,3)=maxdate))),uidnmd,SORT(UNIQUE(VSTACK(INDEX(fsrc,,1),uidmd),,1)),utd,FILTER(fsrc,ISNUMBER(MATCH(INDEX(fsrc,,1),uidmd,0))),nutd,FILTER(fsrc,NOT(ISNUMBER(MATCH(INDEX(fsrc,,1),uidmd,0)))),VSTACK(SORT(utd,{1,3},{1,-1}),EXPAND("",,COLUMNS(fsrc),""),VSTACK(SORT(nutd,{1,3},{1,-1}))))
Dynamic array formulas.
 
Upvote 0
why 14/11/23 - will that change

how about
=(COUNTIFS($A$2:$A$14,A2,$C$2:$C$14,$H$2)=1)
where H2 is the date

Book2
ABCDEFGHI
1
23623CLEAR11/14/23SEVERETRUE11/14/23TRUE
33623CLEAR7/11/23SEVERETRUETRUE
43623CLEAR3/21/23SEVERETRUETRUE
53623CLEAR12/6/22SEVERETRUETRUE
63624CLEAR12/6/22SEVEREFALSEFALSE
73625CLEAR11/14/23SEVERETRUETRUE
83625CLEAR7/11/23SEVERETRUETRUE
93625CLEAR3/21/23SEVERETRUETRUE
103625CLEAR12/6/22SEVERETRUETRUE
113626CLEAR11/14/23SEVEREFALSETRUE
123626CLEAR7/11/23SEVEREFALSETRUE
133626CLEAR3/21/23SEVEREFALSETRUE
143626CLEAR12/6/22SEVEREFALSETRUE
15FALSE
Sheet1
Cell Formulas
RangeFormula
I2:I15I2=(COUNTIFS($A$2:$A$14,A2,$C$2:$C$14,$H$2)=1)
Cells with Conditional Formatting
CellConditionCell FormatStop If True
A2:E14Expression=$B2="R"textNO
A2:E14Expression=$B2="IR"textNO
A2:E14Expression=COUNTIF(#REF!, #REF!)>0textNO
A2:E14Expression=$F2textNO
 
Upvote 0
Thank you for the replies!

Krice, your answer blowed my mind, and despite working on my example it didn't work on the main spreadsheet. Also perhaps if it was possible that I could edit small portions of the data plus, as etaf mentioned, it's likelly that more tests will be done and more data will be added, making other individuals inactive, don;t know if that's possible with your formula? I've created a more complete example from the main spreadsheet.

Etaf, I'm using a similar conditional format to let me know when it's a new individual as you will see in the new example, simply to make it easy to know while scrolling through the data when it's still one individual or a different one. Thing is with your format I don't think I would be able to sort it and "throw" the inactive individuals to the bottom?

The ideal is to scroll through the date and see case by case, removing the inactive individuals would make the process less paniful (as it would remove thousands of entries from the middle of the data, straight to the bottom)

Link:


Thanks so far, really appreciate it!!
 
Upvote 0
I'm curious about why the formula did not work with your real-life worksheet. The only "hooks" into the actual data are here, in this part of the expression: src,C3:H20. Did you change the range reference to match your actual data?...and are your actual data in the same order as shown, with the 1st and 3rd columns of the range reference containing the user ID and Dates, respectively?
 
Upvote 0
I'm curious about why the formula did not work with your real-life worksheet. The only "hooks" into the actual data are here, in this part of the expression: src,C3:H20. Did you change the range reference to match your actual data?...and are your actual data in the same order as shown, with the 1st and 3rd columns of the range reference containing the user ID and Dates, respectively?
Thanks for the reply again KRice, I've played a bit more with your formula and figured out why it wasn't working. Due to the ID Have numbers, a few digits and only the 4 for digits in order of sequence to sort them (for example UDH823494431 and UDH823494432 I had on A2 the formula =RIGHT(B2,4). Changing your formula to ignore the first colum worked, but I would still need to order them by the last four digits of the ID and date newest to oldest to analyse and see patterns, which I don't seem to be able to do in the array :/ or am I missing something? Sorry about this, doing my best to learn! My second link has a better example of the real spreadsheet, include formulas and conditional formatting using on the real one
 
Upvote 0
Thanks for the reply again KRice, I've played a bit more with your formula and figured out why it wasn't working. Due to the ID Have numbers, a few digits and only the 4 for digits in order of sequence to sort them (for example UDH823494431 and UDH823494432 I had on A2 the formula =RIGHT(B2,4). Changing your formula to ignore the first colum worked, but I would still need to order them by the last four digits of the ID and date newest to oldest to analyse and see patterns, which I don't seem to be able to do in the array :/ or am I missing something? Sorry about this, doing my best to learn! My second link has a better example of the real spreadsheet, include formulas and conditional formatting using on the real one
UDH823494431 and NL8535604432

Correction in the example, meant to change the other numbers and maintaining an order in the last 4
 
Upvote 0
I don't see any issues adapting the formula to your worksheet. To make things clearer, I've introduced two more variables. The full data range is specified as "src". Then we filter "src" to eliminate any rows where blanks appear in the 1st column, and we call that consolidated array "fsrc" (for filtered source). Next, we declare which columns in "fsrc" hold the "shortID" and the "dates"...in your real-life version, those are columns 1 and 12. Next we determine what the maximum (most recent) date is, and call it "maxdate". Then we determine a unique list of user ID's with an entry whose date is the maxdate (that list is found in the array named "uidmd"...user id with max date). We do something similar and find the list of user IDs that do not have an entry with the max date, and that list is found in the array "uidnmd" (user ID Not max date). Then to construct the upper part of the results table, we filter the full filtered source data "fsrc" by including only those rows where the shortID (the 4-digit code) appears in the list of "uidmd" (meaning one of that person's entries has the max date)...and that array is called "utd". A similar list is created for "fsrc" rows whose shortIDs are not associated with a max date...and that array is called "nutd". Finally, we sort both of these arrays (utd and nutd) and sort by the shortID and the data, ascending order for shortID followed by descending data order, and these are stacked together, separated by a blank row.
MrExcel_20240203.xlsx
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAF
1
2
33623KK36233623CLEAR14/11/2023SEVERETRUE3623KK362300000003623CLEAR14/11/2023SEVERE0TRUE
43623HH36233623CLEAR11/7/2023SEVERETRUE3623YY362300000003623CLEAR6/10/2023SEVERE0TRUE
53623U36233623CLEAR21/3/2023SEVERETRUE3623HH362300000003623CLEAR11/7/2023SEVERE0TRUE
63623CCC36233623CLEAR6/12/2022SEVERETRUE3623U362300000003623CLEAR21/3/2023SEVERE0TRUE
73624EE36243624CLEAR6/12/2022SEVEREFALSE3623CCC362300000003623CLEAR6/12/2022SEVERE0TRUE
83625WWW36253625CLEAR14/11/2023SEVERETRUE3625WWW362500000003625CLEAR14/11/2023SEVERE0TRUE
93625OOO36253625CLEAR11/7/2023SEVERETRUE3625OOO362500000003625CLEAR11/7/2023SEVERE0TRUE
103625GG36253625CLEAR21/3/2023SEVERETRUE3625EE362500000003625CLEAR21/4/2023SEVERE0TRUE
113625GG36253625CLEAR6/12/2022SEVERETRUE3625GG362500000003625CLEAR21/3/2023SEVERE0TRUE
123626PP36263626CLEAR14/11/2023SEVEREFALSE3625GG362500000003625CLEAR6/12/2022SEVERE0TRUE
133626BBB36263626CLEAR11/7/2023SEVEREFALSE3626PP362600000003626CLEAR14/11/2023SEVERE0FALSE
143626JJJ36263626CLEAR21/3/2023SEVEREFALSE3626BBB362600000003626CLEAR11/7/2023SEVERE0FALSE
153626D36263626CLEAR6/12/2022SEVEREFALSE3626JJJ362600000003626CLEAR21/3/2023SEVERE0FALSE
163626D362600000003626CLEAR6/12/2022SEVERE0FALSE
173623YY36233623CLEAR6/10/2023SEVERETRUE
183628II36283628CLEAR8/8/2023SEVEREFALSE3624EE362400000003624CLEAR6/12/2022SEVERE0FALSE
193628II362800000003628CLEAR8/8/2023SEVERE0FALSE
203625EE36253625CLEAR21/4/2023SEVERETRUE
Sheet4
Cell Formulas
RangeFormula
R3:AF19R3=LET(src,A3:O20, fsrc,FILTER(src,INDEX(src,,1)<>""), shortID,INDEX(fsrc,,1), dates,INDEX(fsrc,,12), maxdate,MAX(dates), uidmd,SORT(UNIQUE(FILTER(shortID,dates=maxdate))), uidnmd,SORT(UNIQUE(VSTACK(shortID,uidmd),,1)), utd,FILTER(fsrc,ISNUMBER(MATCH(shortID,uidmd,0))), nutd,FILTER(fsrc,NOT(ISNUMBER(MATCH(shortID,uidmd,0)))), VSTACK(SORT(utd,{1,12},{1,-1}),EXPAND("",,COLUMNS(fsrc),""),VSTACK(SORT(nutd,{1,12},{1,-1}))))
A3:A15,A20,A17:A18A3=RIGHT(B3,4)
Dynamic array formulas.
 
Upvote 0
I don't see any issues adapting the formula to your worksheet. To make things clearer, I've introduced two more variables. The full data range is specified as "src". Then we filter "src" to eliminate any rows where blanks appear in the 1st column, and we call that consolidated array "fsrc" (for filtered source). Next, we declare which columns in "fsrc" hold the "shortID" and the "dates"...in your real-life version, those are columns 1 and 12. Next we determine what the maximum (most recent) date is, and call it "maxdate". Then we determine a unique list of user ID's with an entry whose date is the maxdate (that list is found in the array named "uidmd"...user id with max date). We do something similar and find the list of user IDs that do not have an entry with the max date, and that list is found in the array "uidnmd" (user ID Not max date). Then to construct the upper part of the results table, we filter the full filtered source data "fsrc" by including only those rows where the shortID (the 4-digit code) appears in the list of "uidmd" (meaning one of that person's entries has the max date)...and that array is called "utd". A similar list is created for "fsrc" rows whose shortIDs are not associated with a max date...and that array is called "nutd". Finally, we sort both of these arrays (utd and nutd) and sort by the shortID and the data, ascending order for shortID followed by descending data order, and these are stacked together, separated by a blank row.
MrExcel_20240203.xlsx
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAF
1
2
33623KK36233623CLEAR14/11/2023SEVERETRUE3623KK362300000003623CLEAR14/11/2023SEVERE0TRUE
43623HH36233623CLEAR11/7/2023SEVERETRUE3623YY362300000003623CLEAR6/10/2023SEVERE0TRUE
53623U36233623CLEAR21/3/2023SEVERETRUE3623HH362300000003623CLEAR11/7/2023SEVERE0TRUE
63623CCC36233623CLEAR6/12/2022SEVERETRUE3623U362300000003623CLEAR21/3/2023SEVERE0TRUE
73624EE36243624CLEAR6/12/2022SEVEREFALSE3623CCC362300000003623CLEAR6/12/2022SEVERE0TRUE
83625WWW36253625CLEAR14/11/2023SEVERETRUE3625WWW362500000003625CLEAR14/11/2023SEVERE0TRUE
93625OOO36253625CLEAR11/7/2023SEVERETRUE3625OOO362500000003625CLEAR11/7/2023SEVERE0TRUE
103625GG36253625CLEAR21/3/2023SEVERETRUE3625EE362500000003625CLEAR21/4/2023SEVERE0TRUE
113625GG36253625CLEAR6/12/2022SEVERETRUE3625GG362500000003625CLEAR21/3/2023SEVERE0TRUE
123626PP36263626CLEAR14/11/2023SEVEREFALSE3625GG362500000003625CLEAR6/12/2022SEVERE0TRUE
133626BBB36263626CLEAR11/7/2023SEVEREFALSE3626PP362600000003626CLEAR14/11/2023SEVERE0FALSE
143626JJJ36263626CLEAR21/3/2023SEVEREFALSE3626BBB362600000003626CLEAR11/7/2023SEVERE0FALSE
153626D36263626CLEAR6/12/2022SEVEREFALSE3626JJJ362600000003626CLEAR21/3/2023SEVERE0FALSE
163626D362600000003626CLEAR6/12/2022SEVERE0FALSE
173623YY36233623CLEAR6/10/2023SEVERETRUE
183628II36283628CLEAR8/8/2023SEVEREFALSE3624EE362400000003624CLEAR6/12/2022SEVERE0FALSE
193628II362800000003628CLEAR8/8/2023SEVERE0FALSE
203625EE36253625CLEAR21/4/2023SEVERETRUE
Sheet4
Cell Formulas
RangeFormula
R3:AF19R3=LET(src,A3:O20, fsrc,FILTER(src,INDEX(src,,1)<>""), shortID,INDEX(fsrc,,1), dates,INDEX(fsrc,,12), maxdate,MAX(dates), uidmd,SORT(UNIQUE(FILTER(shortID,dates=maxdate))), uidnmd,SORT(UNIQUE(VSTACK(shortID,uidmd),,1)), utd,FILTER(fsrc,ISNUMBER(MATCH(shortID,uidmd,0))), nutd,FILTER(fsrc,NOT(ISNUMBER(MATCH(shortID,uidmd,0)))), VSTACK(SORT(utd,{1,12},{1,-1}),EXPAND("",,COLUMNS(fsrc),""),VSTACK(SORT(nutd,{1,12},{1,-1}))))
A3:A15,A20,A17:A18A3=RIGHT(B3,4)
Dynamic array formulas.
Thanks for the explanation and that did the trick!! There's only one little detail that perhaps you help with. Due to being a big data base and having to analyse manually, it's handy to have a limitations separating between different individuals to know visusally when it's a different one. I was using =IF(B2=B1,O1,NOT(O1)) with FALSE on O1, this would give me a colum of false and true changing whenever was a different individual and I had it highlithed through conditional formatting like on my second link. Once I apply your formula down until O9630 that is lost, and I have many TRUE or FALSES straight after another even if it's already a different individual. Or maybe there is a easier way to implement that?

As in:

2233 TRUE
2233 TRUE
2233 TRUE
3579 TRUE
3579 TRUE
3579 TRUE
4389 TRUE

Instead of:

2233 TRUE
2233 TRUE
2233 TRUE
3579 FALSE
3579 FALSE
3579 FALSE
4389 TRUE
4389 TRUE

And if I may ask, only now I've started reading about uidmd and src and what's the easiest way to generate it? Just so that I can maintain that formula for future tests (as in MAXDATE changes)

Thanks a lot KRice, this is already a big help considering I've sorted the last 10k of data manually on this subject
 
Upvote 0
Never mind my previous question! Adding that formula on a new colum will still read the values from the array table. Sorry about the stupid question!! If only you could tell me now how did you get/generate those src and uidmd so that I could maintain the formula in the future with more data and different dates. You're a Legend!!
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top