I'm stumped with a requirement and can't seem to get past it (I've tried to scour the internet, and I'm unable to fix it myself, given my lesser experience with Power Query). I could accomplish this using VBA, but the number of records, that need to be processed are shy of half a million records!
So here's the simplified task at hand. There are two columns - "Substring" and "Main String", as shown below [Input]:
[TABLE="width: 235"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[/TR]
</tbody>[/TABLE]
The Substring column has mini substrings separated by a delimiter (|). These Mini Substrings may or may not be found within the Main string. The output at the end should look like this [Expected Output]:
[TABLE="width: 764"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[TD]MSS1[/TD]
[TD]ML1[/TD]
[TD]MSS2[/TD]
[TD]ML2[/TD]
[TD]MSS3[/TD]
[TD]ML3[/TD]
[TD]MSS4[/TD]
[TD]ML4[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[TD]CDE[/TD]
[TD]3[/TD]
[TD]AB[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[TD]ABC[/TD]
[TD]3[/TD]
[TD]DE[/TD]
[TD]2[/TD]
[TD]GH[/TD]
[TD]2[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[TD]BC[/TD]
[TD]2[/TD]
[TD]YZ[/TD]
[TD]2[/TD]
[TD]A[/TD]
[TD]1[/TD]
[TD]X[/TD]
[TD]0[/TD]
[/TR]
</tbody>[/TABLE]
(Note: MSS and ML in the column headers stand for Mini Substring and Match Length respectively)
The Mini Substrings can vary in number, so the row that has the maximum number of Mini Substrings would define the number of MSS and ML columns. In this case, the third row has four of them, so there were four MSS and ML columns each.
Not just that, the sequence of these Mini Substrings has to be such that those with highest matching lengths should be placed on the leftmost MSS and ML pair of columns. And then those with the least matching lengths should go on the rightmost pair of MSS and ML columns. And everything in between should progress in that order.
I got until the stage where the current output looks like this [Interim Output]:
[TABLE="width: 421"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[TD]MSS1[/TD]
[TD]MSS2[/TD]
[TD]MSS3[/TD]
[TD]MSS4[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[TD]AB[/TD]
[TD]CDE[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[TD]ABC[/TD]
[TD]DE[/TD]
[TD]GH[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[TD]A[/TD]
[TD]BC[/TD]
[TD]X[/TD]
[TD]YZ[/TD]
[/TR]
</tbody>[/TABLE]
Dynamically inserting the ML columns just after the respective MSS columns is the challenge I'm unable to get past by. And I'm sure sorting them horizontally by the rank of matching lengths (i.e. lengths of those mini substrings that found a match in the main string) would also be another challenge. Here's the Power Query that I could come up with that generated the [Interim Output].
Could you please guide me out of this? You can easily tell I am new to Power Query, so I very much appreciate any help that I can get. Thank you!
Note: I've also posted this on a different forum, but awaiting a response there.
So here's the simplified task at hand. There are two columns - "Substring" and "Main String", as shown below [Input]:
[TABLE="width: 235"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[/TR]
</tbody>[/TABLE]
The Substring column has mini substrings separated by a delimiter (|). These Mini Substrings may or may not be found within the Main string. The output at the end should look like this [Expected Output]:
[TABLE="width: 764"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[TD]MSS1[/TD]
[TD]ML1[/TD]
[TD]MSS2[/TD]
[TD]ML2[/TD]
[TD]MSS3[/TD]
[TD]ML3[/TD]
[TD]MSS4[/TD]
[TD]ML4[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[TD]CDE[/TD]
[TD]3[/TD]
[TD]AB[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[TD]ABC[/TD]
[TD]3[/TD]
[TD]DE[/TD]
[TD]2[/TD]
[TD]GH[/TD]
[TD]2[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[TD]BC[/TD]
[TD]2[/TD]
[TD]YZ[/TD]
[TD]2[/TD]
[TD]A[/TD]
[TD]1[/TD]
[TD]X[/TD]
[TD]0[/TD]
[/TR]
</tbody>[/TABLE]
(Note: MSS and ML in the column headers stand for Mini Substring and Match Length respectively)
The Mini Substrings can vary in number, so the row that has the maximum number of Mini Substrings would define the number of MSS and ML columns. In this case, the third row has four of them, so there were four MSS and ML columns each.
Not just that, the sequence of these Mini Substrings has to be such that those with highest matching lengths should be placed on the leftmost MSS and ML pair of columns. And then those with the least matching lengths should go on the rightmost pair of MSS and ML columns. And everything in between should progress in that order.
I got until the stage where the current output looks like this [Interim Output]:
[TABLE="width: 421"]
<tbody>[TR]
[TD]Substring[/TD]
[TD]Main String[/TD]
[TD]MSS1[/TD]
[TD]MSS2[/TD]
[TD]MSS3[/TD]
[TD]MSS4[/TD]
[/TR]
[TR]
[TD]AB|CDE[/TD]
[TD]ABCDEF[/TD]
[TD]AB[/TD]
[TD]CDE[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]ABC|DE|GH[/TD]
[TD]ABCDEFGHI[/TD]
[TD]ABC[/TD]
[TD]DE[/TD]
[TD]GH[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]A|BC|X|YZ[/TD]
[TD]ABCDYZ[/TD]
[TD]A[/TD]
[TD]BC[/TD]
[TD]X[/TD]
[TD]YZ[/TD]
[/TR]
</tbody>[/TABLE]
Dynamically inserting the ML columns just after the respective MSS columns is the challenge I'm unable to get past by. And I'm sure sorting them horizontally by the rank of matching lengths (i.e. lengths of those mini substrings that found a match in the main string) would also be another challenge. Here's the Power Query that I could come up with that generated the [Interim Output].
Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sub String", type text}, {"Main String", type text}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Sub String", "For Split"),
#"Added Custom" = Table.AddColumn(#"Duplicated Column", "Split Count", each List.Count(Text.Split([Sub String],"|"))),
#"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Split Count", "Sub String", "Main String", "For Split"}),
#"Max No Of Splittable Columns" = List.Max(#"Reordered Columns"[Split Count]),
#"List Of MSS Columns" = List.Transform({1..#"Max No Of Splittable Columns"}, each "MSS"&Text.From(_)),
#"Split Columns By Delimiter" = Table.SplitColumn(#"Reordered Columns","For Split",Splitter.SplitTextByDelimiter("|"), #"List Of MSS Columns"),
#"Removed Columns" = Table.RemoveColumns(#"Split Columns By Delimiter",{"Split Count"})
in
#"Removed Columns"
Could you please guide me out of this? You can easily tell I am new to Power Query, so I very much appreciate any help that I can get. Thank you!
Note: I've also posted this on a different forum, but awaiting a response there.