Hi,
Please see attached mini sheet. I am trying to transform a table via Power Query. The input data can be seen in the columns A to E and the desired output from Power Query are the same columns, but also including column F.
The table rows in the mini sheet are sorted (partially manually) for to better illustrate how rows are related. However, in real data sets, the rows are not sorted. I have formulas in column K and L which create the output I want to see in column F. These formulas work fine with small data samples, but the workbook becomes too slow with larger data sets.
Some explanation of the principle behind the data generated in the table:
The table contains database entries. I have only included those columns which are relevant for solving the problem at hand. When a new row is entered into the database, it will get the value "false" in column C. The value "false" simply means that the entry hasn’t (yet) been revised; that all columns (many of which haven’t been included in this sample) retain its original entry. This will also be evident in that the date/timestamp in column E, named Changed, is equal to the date/timestamp in column D, named Created.
However, sometimes a database entry will be revised. What happens then is that the originally entered row will be kept as it is, except that the value in column C will be changed to "true", signifying that the row has been changed. Also, the date/timestamp in column E will be changed to the date/timestamp that the change happened.
Changing a row like this will automatically create a new row in the database, which contains the altered information. This new row will have the value "false" in column C, like the originally entered row once had, and the date/timestamp in column D will equal the date/timestamp in column E. Notably, the date/timestamp in column D named Created for this row will be equal to the date/timestamp in column E named Changed for the row that was altered, since the new row was created in the same instant that the original row was altered. This date/timestamp relationship makes it possible to identify that these two rows in the table are indeed related and should be grouped together.
If there are several revisions for a specific database entry, it is the latest row for this entry, containing the value "false" in column C and the most recent date/timestamp in columns D and E, that will be revised, leaving behind yet another row with the value "true" in column C, as explained earlier. Thus, there may be several rows with the value "true" in column C that are chronologically linked to the final entry, for which the value in column C will always be "false".
I am trying to use Power Query to group those items from column A and column B which have an unbroken relationship through date/timestamps as explained above, find the earliest related date/timestamp for these, and copy that date/timestamp to each row for the group in a new column.
Here is the formula that, for each row within a specific group, identifies the row containing the earliest date/timestamp for that group:
Here is the formula for retrieving the relevant date/timestamp based on the rows identified with the previous formula:
I hope someone will be able to help me out with this!
Best regards,
Lexer
Please see attached mini sheet. I am trying to transform a table via Power Query. The input data can be seen in the columns A to E and the desired output from Power Query are the same columns, but also including column F.
The table rows in the mini sheet are sorted (partially manually) for to better illustrate how rows are related. However, in real data sets, the rows are not sorted. I have formulas in column K and L which create the output I want to see in column F. These formulas work fine with small data samples, but the workbook becomes too slow with larger data sets.
Some explanation of the principle behind the data generated in the table:
The table contains database entries. I have only included those columns which are relevant for solving the problem at hand. When a new row is entered into the database, it will get the value "false" in column C. The value "false" simply means that the entry hasn’t (yet) been revised; that all columns (many of which haven’t been included in this sample) retain its original entry. This will also be evident in that the date/timestamp in column E, named Changed, is equal to the date/timestamp in column D, named Created.
However, sometimes a database entry will be revised. What happens then is that the originally entered row will be kept as it is, except that the value in column C will be changed to "true", signifying that the row has been changed. Also, the date/timestamp in column E will be changed to the date/timestamp that the change happened.
Changing a row like this will automatically create a new row in the database, which contains the altered information. This new row will have the value "false" in column C, like the originally entered row once had, and the date/timestamp in column D will equal the date/timestamp in column E. Notably, the date/timestamp in column D named Created for this row will be equal to the date/timestamp in column E named Changed for the row that was altered, since the new row was created in the same instant that the original row was altered. This date/timestamp relationship makes it possible to identify that these two rows in the table are indeed related and should be grouped together.
If there are several revisions for a specific database entry, it is the latest row for this entry, containing the value "false" in column C and the most recent date/timestamp in columns D and E, that will be revised, leaving behind yet another row with the value "true" in column C, as explained earlier. Thus, there may be several rows with the value "true" in column C that are chronologically linked to the final entry, for which the value in column C will always be "false".
I am trying to use Power Query to group those items from column A and column B which have an unbroken relationship through date/timestamps as explained above, find the earliest related date/timestamp for these, and copy that date/timestamp to each row for the group in a new column.
Here is the formula that, for each row within a specific group, identifies the row containing the earliest date/timestamp for that group:
Excel Formula:
=IF(SUMPRODUCT(($A$2:$A$42=$A2)*($B$2:$B$42=$B2)*($C$2:$C$42="True")*(E$2:E$42=$D2)*(ROW($C$2:$C$42)<>ROW()))<>0,INDEX($K$1:$K$42,AGGREGATE(14,6,ROW($C$2:$C$42)/(($A$2:$A$42=$A2)*($B$2:$B$42=$B2)*(C$2:$C$42="True")*($E$2:$E$42=$D2)),1)),ROW()-ROW($K$1))
Here is the formula for retrieving the relevant date/timestamp based on the rows identified with the previous formula:
Excel Formula:
=INDEX($D$2:$D$42,$K2,)
Power Query timestamp lookup.xlsx | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | F | G | H | I | J | K | L | M | |||
1 | Batch | Code | Row changed (true = line has one or more successors and may have one or more predeccessors. false = line has no successors, but may have one ore more predeccessors). | Created (timestamp will be equal to immediate predecessor's (if any) "Changed" timestamp) | Changed (timestamp will be equal to immediate successor's (if any) "Created" timestamp) | Desired result from Power Query operations | Table rows | Table rows which contain correct results in "Created" column (reflected in column "Desired result from Power Query operations" | Hierarchy for colored example rows (1 = oldest/ original) | FORMULA for traced back rows | FORMULA for traced back dates | Test | |||
2 | 123456 | AA | false | 12.11.2018 09:38:20 | 12.11.2018 09:38:20 | 12.11.2018 09:38:20 | 1 | 1 | 1 | 12.11.2018 09:38:20 | TRUE | ||||
3 | 123456 | AA | false | 12.11.2018 09:38:21 | 12.11.2018 09:38:21 | 12.11.2018 09:38:21 | 2 | 2 | 2 | 12.11.2018 09:38:21 | TRUE | ||||
4 | 123456 | BB | false | 12.11.2018 09:38:59 | 12.11.2018 09:38:59 | 12.11.2018 09:38:59 | 3 | 3 | 3 | 12.11.2018 09:38:59 | TRUE | ||||
5 | 123456 | BB | false | 12.11.2018 09:38:59 | 12.11.2018 09:38:59 | 12.11.2018 09:38:59 | 5 | 5 | 4 | 12.11.2018 09:38:59 | TRUE | ||||
6 | 123457 | AA | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 6 | 6 | 5 | 12.11.2018 09:41:25 | TRUE | ||||
7 | 123457 | AA | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 37 | 37 | 6 | 12.11.2018 09:41:25 | TRUE | ||||
8 | 123457 | BB | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 7 | 7 | 7 | 12.11.2018 09:41:25 | TRUE | ||||
9 | 123457 | BB | true | 12.11.2018 09:41:25 | 12.11.2018 09:47:27 | 12.11.2018 09:41:25 | 8 | 8 | I_1 | 8 | 12.11.2018 09:41:25 | TRUE | |||
10 | 123457 | BB | false | 12.11.2018 09:47:27 | 12.11.2018 09:47:27 | 12.11.2018 09:41:25 | 9 | 8 | I_2 | 8 | 12.11.2018 09:41:25 | TRUE | |||
11 | 123458 | CC | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 10 | 10 | 10 | 12.11.2018 09:41:25 | TRUE | ||||
12 | 123458 | CC | false | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 16 | 16 | 11 | 12.11.2018 10:48:44 | TRUE | ||||
13 | 123458 | DD | false | 12.11.2018 10:44:54 | 12.11.2018 10:44:54 | 12.11.2018 10:44:54 | 12 | 12 | 12 | 12.11.2018 10:44:54 | TRUE | ||||
14 | 123458 | DD | false | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 18 | 18 | 13 | 12.11.2018 10:48:44 | TRUE | ||||
15 | 123459 | CC | false | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 39 | 39 | 14 | 12.11.2018 10:48:44 | TRUE | ||||
16 | 123459 | CC | false | 12.11.2018 10:50:29 | 12.11.2018 10:50:29 | 12.11.2018 10:50:29 | 21 | 21 | 15 | 12.11.2018 10:50:29 | TRUE | ||||
17 | 123459 | DD | true | 12.11.2018 10:46:00 | 12.11.2018 10:47:00 | 12.11.2018 10:46:00 | 15 | 15 | III_1 | 16 | 12.11.2018 10:46:00 | TRUE | |||
18 | 123459 | DD | true | 12.11.2018 10:47:00 | 12.11.2018 10:47:01 | 12.11.2018 10:46:00 | 29 | 15 | III_2 | 16 | 12.11.2018 10:46:00 | TRUE | |||
19 | 123459 | DD | true | 12.11.2018 10:47:01 | 12.11.2018 10:51:36 | 12.11.2018 10:46:00 | 19 | 15 | III_3 | 16 | 12.11.2018 10:46:00 | TRUE | |||
20 | 123459 | DD | true | 12.11.2018 10:51:36 | 12.11.2018 10:58:41 | 12.11.2018 10:46:00 | 22 | 15 | III_4 | 16 | 12.11.2018 10:46:00 | TRUE | |||
21 | 123459 | DD | false | 12.11.2018 10:58:41 | 12.11.2018 10:58:41 | 12.11.2018 10:46:00 | 26 | 15 | III_5 | 16 | 12.11.2018 10:46:00 | TRUE | |||
22 | 123459 | DD | false | 12.11.2018 10:52:22 | 12.11.2018 10:52:22 | 12.11.2018 10:52:22 | 25 | 25 | 21 | 12.11.2018 10:52:22 | TRUE | ||||
23 | 123459 | DD | true | 12.11.2018 11:00:36 | 12.11.2018 11:01:41 | 12.11.2018 11:00:36 | 23 | 23 | IV_1 | 22 | 12.11.2018 11:00:36 | TRUE | |||
24 | 123459 | DD | false | 12.11.2018 11:01:41 | 12.11.2018 11:01:41 | 12.11.2018 11:00:36 | 24 | 23 | IV_2 | 22 | 12.11.2018 11:00:36 | TRUE | |||
25 | 123459 | DD | false | 12.11.2018 11:58:43 | 12.11.2018 11:58:43 | 12.11.2018 11:58:43 | 14 | 14 | 24 | 12.11.2018 11:58:43 | TRUE | ||||
26 | 123460 | EE | true | 12.11.2018 10:36:22 | 12.11.2018 10:36:56 | 12.11.2018 10:36:22 | 34 | 34 | 25 | 12.11.2018 10:36:22 | TRUE | ||||
27 | 123460 | EE | false | 12.11.2018 10:50:52 | 12.11.2018 10:50:52 | 12.11.2018 10:50:52 | 28 | 28 | 26 | 12.11.2018 10:50:52 | TRUE | ||||
28 | 123460 | EE | false | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 35 | 35 | 27 | 12.11.2018 10:51:36 | TRUE | ||||
29 | 123460 | FF | false | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 31 | 31 | 28 | 12.11.2018 10:51:36 | TRUE | ||||
30 | 123460 | FF | false | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 32 | 32 | 29 | 12.11.2018 10:51:36 | TRUE | ||||
31 | 123461 | EE | true | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 20 | 20 | II_1 | 30 | 12.11.2018 10:51:36 | TRUE | |||
32 | 123461 | EE | false | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 13 | 20 | II_2 | 30 | 12.11.2018 10:51:36 | TRUE | |||
33 | 123461 | GG | false | 12.11.2018 10:51:47 | 12.11.2018 10:51:47 | 12.11.2018 10:51:47 | 36 | 36 | 32 | 12.11.2018 10:51:47 | TRUE | ||||
34 | 123461 | HH | false | 13.11.2018 07:26:18 | 13.11.2018 07:26:18 | 13.11.2018 07:26:18 | 38 | 38 | 33 | 13.11.2018 07:26:18 | TRUE | ||||
35 | 123461 | II | false | 12.11.2018 09:15:13 | 12.11.2018 09:15:13 | 12.11.2018 09:15:13 | 40 | 40 | 34 | 12.11.2018 09:15:13 | TRUE | ||||
36 | 123461 | JJ | false | 12.11.2018 09:15:13 | 12.11.2018 09:15:13 | 12.11.2018 09:15:13 | 11 | 11 | 35 | 12.11.2018 09:15:13 | TRUE | ||||
37 | 123462 | AA | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 30 | 30 | 36 | 12.11.2018 09:41:25 | TRUE | ||||
38 | 123463 | GG | false | 12.11.2018 10:44:54 | 12.11.2018 10:44:54 | 12.11.2018 10:44:54 | 17 | 17 | 37 | 12.11.2018 10:44:54 | TRUE | ||||
39 | 123466 | PP | false | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 12.11.2018 09:41:25 | 4 | 4 | 38 | 12.11.2018 09:41:25 | TRUE | ||||
40 | 123467 | KK | false | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 12.11.2018 10:48:44 | 27 | 27 | 39 | 12.11.2018 10:48:44 | TRUE | ||||
41 | 123475 | EE | false | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 12.11.2018 10:51:36 | 41 | 41 | 40 | 12.11.2018 10:51:36 | TRUE | ||||
42 | 123475 | MM | false | 12.11.2018 10:51:47 | 12.11.2018 10:51:47 | 12.11.2018 10:51:47 | 33 | 33 | 41 | 12.11.2018 10:51:47 | TRUE | ||||
Example2 |
Cell Formulas | ||
---|---|---|
Range | Formula | |
K2:K42 | K2 | =IF(SUMPRODUCT(($A$2:$A$42=$A2)*($B$2:$B$42=$B2)*($C$2:$C$42="True")*(E$2:E$42=$D2)*(ROW($C$2:$C$42)<>ROW()))<>0,INDEX($K$1:$K$42,AGGREGATE(14,6,ROW($C$2:$C$42)/(($A$2:$A$42=$A2)*($B$2:$B$42=$B2)*(C$2:$C$42="True")*($E$2:$E$42=$D2)),1)),ROW()-ROW($K$1)) |
L2:L42 | L2 | =INDEX($D$2:$D$42,$K2,) |
M2:M42 | M2 | =L2=F2 |
I hope someone will be able to help me out with this!
Best regards,
Lexer