Power Query Split To Rows For 2 Similar Columns - 2520

GeertD · Oct 16, 2022

Below is a quick transcription of the Mr.Excel M-code.
Remarks:

"Compiled" it into a single query,
Streamlined it a bit,
Buffered the Source table.

It is remarkably compact and quite efficient in its execution: a Result of a good 1 Million rows takes about 22.5 seconds.
(it runs markedly faster than Suat Ozgur's solution)

Power Query:

let
    Source = Table.Buffer(Excel.CurrentWorkbook(){[Name="fInput"]}[Content]),
    Products = Table.AddIndexColumn(Table.ExpandListColumn(Table.TransformColumns(Table.RemoveColumns(Source,{"Quantities"}), {{"Products", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Products"), "Index", 0, 1, Int64.Type),
    Quantities = Table.AddIndexColumn(Table.ExpandListColumn(Table.TransformColumns(Table.RemoveColumns(Source,{"Products"}), {{"Quantities", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Quantities"), "Index", 0, 1, Int64.Type),
    Result = Table.RemoveColumns(Table.ExpandTableColumn(Table.NestedJoin(Products, {"Index"}, Quantities, {"Index"}, "Quantities", JoinKind.LeftOuter), "Quantities", {"Quantities"}),{"Index"})
in
    Result

PS: The "FUNDES (Functional Design)" (I like to call it that) is in the video.

GeertD · Oct 16, 2022

The onset/context: Suat Ozgur's solution is exceedingly beautiful, but:

My first impression: very elaborate
Surely, this little challenge can be solved in an easier way – or else PQ has failed!
(I hadn't looked at Mr.Excel's solution at that point – see above)

Here's my first solution (of a total of 2). It was inspired by the "Excel Classic"-solution as explained in the Mr.Excel video.
It's efficient in its execution: a Result of a good 1 Million rows takes about 8.4 seconds.
FUNDES:

Source: get the fInput table,
ColumnNames: get them,
ConvertToLists: the payload columns look like lists, but they're actually just text strings – do what the name says,
PairwiseCombineColumns: pairwise combine the payload columns and expand to rows accordingly – actually not that difficult if you "transpose" the List-of-Lists using the ZIP-function,
(remember that: "transpose (of sorts) a List-of-lists using the ZIP function". Once you realize this and the power it implies, I'm sure you're gonna use that enigmatic ZIP function more often)
SplitColumnByDelimiter: split the payload columns in to separate columns.
// buffering in the above actually slows it down by about 1 second – so, don't.

Here's the M-code:

Power Query:

let
    Source =Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = List.Skip(Table.ColumnNames(Source),1),
    ConvertToLists = Table.TransformColumns(Source, {{ColumnNames{0}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}},{ColumnNames{1}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
    PairwiseCombineColumns = Table.TransformColumns(Table.ExpandListColumn(Table.RemoveColumns(Table.AddColumn(ConvertToLists, "Combined", each List.Zip({[Products],[Quantities]})),ColumnNames), "Combined"), {{"Combined", each Text.Combine(_,"-")}}),
    SplitColumnByDelimiter = Table.SplitColumn(PairwiseCombineColumns, "Combined", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), ColumnNames)
in
    SplitColumnByDelimiter

GeertD · Oct 16, 2022

Here's my second solution. It was inspired by the realization that –independently– both payload columns expand identically and therefore there is a direct relationship between those records; no need to lookup, merge, or whatever. Just do those independent expansions, take the columns you need and "slap" 'm together – done! (that's the FUNDES)
Remark: this is the only solution where I was able to make it completely independent of the actual names of the payload columns.
It's efficient in its execution: a Result of a good 1 Million rows takes about 8.0 seconds. This is my best solution.

Here's the M-code:

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = Table.ColumnNames(Source),
    ExpandColumn1 = Table.ExpandListColumn(Table.TransformColumns(Source, {{ColumnNames{1}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), ColumnNames{1}),
    ExpandColumn2 = Table.ExpandListColumn(Table.TransformColumns(Source, {{ColumnNames{2}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), ColumnNames{2}),
    Result = Table.FromColumns({Table.Column(ExpandColumn1,ColumnNames{0}),Table.Column(ExpandColumn1,ColumnNames{1}),Table.Column(ExpandColumn2,ColumnNames{2})},ColumnNames)
in
    Result

GeertD · Oct 16, 2022

Here's Henry Tam's solution as per his YT-comment and my slight completion of it.
FUNDES:

it's an independent variation of my ZIP solution where nested (and thus variable) local variables are used to get the inner lists required in the List-in-List technique.
Things are Text.Combined twice...
...so I SplitByDelimiter twice, too (in a single step for added efficiency).

It's efficient in its execution: a Result of a good 1 Million rows takes about 9.0 seconds.
Here's the M-code:

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = List.Skip(Table.ColumnNames(Source),1),
    PairwiseCombine = Table.RemoveColumns(Table.AddColumn(Source,"Combined",each let a=Text.Split([Products],","),b=Text.Split([Quantities],",") in Text.Combine(List.Transform(List.Zip({a,b}),each Text.Combine(_,"-")),"|")),ColumnNames),
    SplitByDelimiter = Table.SplitColumn(Table.ExpandListColumn(Table.TransformColumns(PairwiseCombine, {{"Combined", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Combined"), "Combined", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),ColumnNames)
in
    SplitByDelimiter

GeertD · Oct 16, 2022

Here's Oyekunle SOPEJU's solution as per his YT-comment (unmodified).
FUNDES:

extract the record (field)values as items in a list (the outer list), keep the cardinality of the fInput table,
Split the Text in all these items (and thus create the inner lists), keep the cardinality of the fInput table,
Construct the Resulting fOutput table on a per orig. record by record basis (the first column will be incomplete by construction),
Combine all those mini-fOutput tables in one big one, increasing the cardinality to its final value,
FillDown the first column and set the data types right (initially they all had to be set to type text of this solution to work).

The beauty of this solution is:

You do most of the transformations where the outermost table keeps its cardinality as long as possible,
No need to ZIP, a simple FillDown suffices.

It's efficient in its execution: a Result of a good 1 Million rows takes about 11.0 seconds.
Here's the M-code:

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ChangeType = Table.TransformColumnTypes(Source,{{"Invoice", type text}, {"Products", type text}, {"Quantities", type text}}),
    ColNames = Table.ColumnNames( ChangeType ),
    Row2list = Table.AddColumn(ChangeType, "step1", each Record.FieldValues( _ )),
    SplitText = Table.AddColumn(Row2list, "step2", each List.Transform( [step1] , each Text.Split( _ , ",") )),
    TblFromCol = Table.AddColumn(SplitText, "step3", each Table.FromColumns( [step2] , ColNames )),
    CmbnTbls = Table.Combine( TblFromCol[step3] ),
    FillDownFirstCol = Table.FillDown(CmbnTbls,{ ColNames{0} }),
    ChangeType1 = Table.TransformColumnTypes(FillDownFirstCol,{{ColNames{0}, Int64.Type}, {ColNames{1}, type text}, {ColNames{2}, Int64.Type}})
in
    ChangeType1

But I feel the actual M-code can be streamlined somewhat. Will that yield any further increased efficiency?...

GeertD · Oct 16, 2022

Here's a modified version of Oyekunle SOPEJU's solution:
FUNDES:

eliminate intermediate steps if you don't need to branch off or re-use.
Don't drag along columns you don't need further down the flow.
I think the name of the 4th step is rather appropriate: it's rather hard to understand 'as is' – but it works!
(and who needs readable code, right? lol)

It's efficient in its execution: a Result of a good 1 Million rows takes about 10.6 seconds. The efficiency increase is quite small.
Here's the M-code:

Power Query:

let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ChangeType = Table.TransformColumnTypes(Source,{{"Invoice", type text}, {"Products", type text}, {"Quantities", type text}}),
    ColNames = Table.ColumnNames(ChangeType),
    WorkTheMagic = Table.Combine(Table.TransformColumns(Table.AddColumn(ChangeType, "step", each Record.FieldValues(_))[[step]],{{"step", each Table.FromColumns(List.Transform(_, each Text.Split(_,",")),ColNames)}})[step]),
    FillDownFirstCol = Table.FillDown(WorkTheMagic,{ColNames{0}}),
    Result = Table.TransformColumnTypes(FillDownFirstCol,{{ColNames{0}, Int64.Type}, {ColNames{1}, type text}, {ColNames{2}, Int64.Type}})
in
    Result

mgirvin · Oct 16, 2022

Love the dialog and cool solutions : ) : ) : ) Go Team!!!

montyfern · May 22, 2023

Hi SME's,
Trying to use this code to fix a .csv that lists everything in one column. They want it split by No, Authors, Title, Journal, Year. It's all in one cell. Have tried numerous splits using Mr. Excel's |, or by , into rows, transposing, blah blah am stumped. Am getting a token error on your "in." It looks like this, in column A, where I renamed it to "Everything." There's tons of blank rows. I can't figure this out. There's two different data types per column. It's a mess. Thanks much!

No

Authors, title, journal, year

1

Person K, Person A, Preston AC, Person SA, Person S, Person D, Person E, Person H, Person I, Person K, Person L, Person M, Pharm.D, Person N, Person MB, Person B, Person O, Person P, Person Q, Person R, Person S, Person T, Person U, Person V, Person W, Person X, Person Y, Person Z. Safety and Efficacy of a Antibody against Malaria. The new england journal of medicine. 2021 Oct. 31 DOI: 11.1111/NEJMoa2206966

montyfern · May 22, 2023

montyfern said:
Hi SME's,
Trying to use this code to fix a .csv that lists everything in one column. They want it split by No, Authors, Title, Journal, Year. It's all in one cell. Have tried numerous splits using Mr. Excel's |, or by , into rows, transposing, blah blah am stumped. Am getting a token error on your "in." It looks like this, in column A, where I renamed it to "Everything." There's tons of blank rows. I can't figure this out. There's two different data types per column. It's a mess. Thanks much!

No
Authors, title, journal, year
1 Person K, Person A, Preston AC, Person SA, Person S, Person D, Person E, Person H, Person I, Person K, Person L, Person M, Pharm.D, Person N, Person MB, Person B, Person O, Person P, Person Q, Person R, Person S, Person T, Person U, Person V, Person W, Person X, Person Y, Person Z. Safety and Efficacy of a Antibody against Malaria. The new england journal of medicine. 2021 Oct. 31 DOI: 11.1111/NEJMoa2206966

ignore the "Everything" comment. It looks like above in cell A1, if that makes sense.

Power Query Split To Rows For 2 Similar Columns - 2520

GeertD

Board Regular

GeertD

Board Regular

GeertD

Board Regular

GeertD

Board Regular

GeertD

Board Regular

GeertD

Board Regular

mgirvin

Well-known Member

montyfern

Board Regular

montyfern

Board Regular

Forum statistics

Share this page

Power Query Split To Rows For 2 Similar Columns - 2520

Board Regular

Board Regular

Board Regular

Board Regular

Board Regular

Board Regular

Well-known Member

Board Regular

Board Regular

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock