Power Query Split To Rows For 2 Similar Columns - 2520

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Oct 13, 2022.
How to split delimited into rows.
I tried using Power Query.
With Power Query I can do it only for 1 column.
If I select more than 1 column, the Split Column is disabled in the Power Query Editor.
Is there any other solution for splitting delimited into rows for more than one column?

This video offers three solutions.
Solution 1 is a formula such as =TEXTJOIN("|",,TEXTSPLIT(C2,",")&"-"&TEXTSPLIT(B2,",")) and then Power Query.

The 2nd solution is M code written by Suat Ozgur.
Power Query:
let
    fnSplit = (row as record) =>
        let
            Invoice = row[Invoice],
            Product = row[Products],
            Qty = row[Quantities],

            ProductList = Text.Split(Product, "," ),
            QtyList = Text.Split(Qty, ","),
            
            TotalProducts = List.Count(ProductList),

            Result = List.Generate(
                () => [i = 0],
                each [i] < TotalProducts,
                each [i = [i] + 1],
                each [
                    Invoice = Invoice,
                    Product = ProductList{[i]},
                    Quantity = QtyList{[i]}
                ]
            )
        in
            Result, 

    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    TotalRows = Table.RowCount(Source),

    SplitRows = List.Generate(
        () => [i = 0],
        each [i] < TotalRows,
        each [i = [i] + 1],
        each fnSplit(Source{[i]})
    ),

    ListToTable = Table.FromList(SplitRows, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandTable = Table.ExpandListColumn(ListToTable, "Column1"),
    Result = Table.ExpandRecordColumn(ExpandTable, "Column1", {"Invoice", "Product", "Quantity"})
in
    Result

The 3rd solution is two intermediate queries in Power Query and then a merge query.

Table of Contents
(0:00) How to Split to Rows
(0:11) TEXTSPLIT formula
(1:30) M solution
(2:44) 3-Query solution
maxresdefault.jpg


Transcript of the video:
Power Query: the Split to Rows is amazing.
But what if we have to do that for two columns? If you try and select two columns and then split, it gets disabled. So here's what we have.
For each invoice, we have a list of products - In this case, there's six.
And a list of quantities. We want to figure out that there's three of T, 18 of B and so on. Now, today in Office Insiders, Microsoft 365, we have this great new function where we can split C by the commas.
Ampersand with a dash. Split B by the commas.
And then join that whole thing with a Pipe character in between.
And we would get this. That's just beautiful.
Ctrl+T to make that all into a table, and we're home free.
Data, from Table or Range. We're going to take this and split it at the Pipe.
Split column by delimiter. Advanced, into rows.
Click okay. Beautiful.
We don't need this, we don't need this. Then here, this is just a straight split column by delimiter at the dash.
Okay, done. Gorgeous.
Close & Load. Let's assume that you're not in Office Insiders and you don't yet have TEXTSPLIT. Look, while I could solve this in the Power Query editor, at this point, I just know that there has to be a better way to write this code.
I turned to Suat Ozgur, author of the soon-to-be upcoming "You Wouldn't Write Professional VBA Using the Macro Recorder, Would You - A Programmer's Guide to M".
And we'll put a link down to this code down in the YouTube description.
Here, this data – we will convert to table. Ctrl+T, click okay.
I'm going to call it Table1. That's perfect because this code is looking for Table1. Data, Get Data, From Other Sources.
Blank Query. View, Advanced editor.
Get rid of what's there. And we'll paste.
And click Done. And Home, Close & Load. That's the way to do it if you're a hot-shot M programmer, but unfortunately I'm not. We're going to turn to my way to do it using the Power Query editor. Which is a horrible set of steps. We're making this into a Table, click okay.
We'll call it OrigData. Data, From Table or Range.
You see all those commas out there so we have to get rid of the Change Type.
This first query is going to get invoice and products, but not quantities.
We take the Products - Split Column, By Delimiter. Split into Rows, click Okay.
We don't need quantities anymore. And on the Add Column, we will add an Index Column, From Zero. This is going to just be called the products. Home. Close & Load, Close & Load To, Only Create a Connection. Then we come back to the exact same table.
From Table or Range. This time we can get rid of the products. Get rid of the Change Type, that way our commas come back here. Split Column, By Delimiter, Comma, into Rows.
Click Okay, good. Add Column, add an Index Column, From Zero.
Make sure that you choose the same index both times.
These will be called the quantity. Home, Close & Load, Close & Load to, Only Create a Connection. Then just out here to the side somewhere, Get data, Combine Queries, Merge. We'll take the Products, the Quantity, Index, Index. Click Okay.
Expand this so we can get the Quantity out. Click Okay.
We don't need the second index. Invoice, Products, Quantities.
Home, Close & Load. There you go.
All right. Hey, I want to thank you for stopping by. We'll see you next time for another netcast from MrExcel.
 
Below is a quick transcription of the Mr.Excel M-code.
Remarks:
  • "Compiled" it into a single query,
  • Streamlined it a bit,
  • Buffered the Source table.
It is remarkably compact and quite efficient in its execution: a Result of a good 1 Million rows takes about 22.5 seconds.
(it runs markedly faster than Suat Ozgur's solution)

Power Query:
let
    Source = Table.Buffer(Excel.CurrentWorkbook(){[Name="fInput"]}[Content]),
    Products = Table.AddIndexColumn(Table.ExpandListColumn(Table.TransformColumns(Table.RemoveColumns(Source,{"Quantities"}), {{"Products", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Products"), "Index", 0, 1, Int64.Type),
    Quantities = Table.AddIndexColumn(Table.ExpandListColumn(Table.TransformColumns(Table.RemoveColumns(Source,{"Products"}), {{"Quantities", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Quantities"), "Index", 0, 1, Int64.Type),
    Result = Table.RemoveColumns(Table.ExpandTableColumn(Table.NestedJoin(Products, {"Index"}, Quantities, {"Index"}, "Quantities", JoinKind.LeftOuter), "Quantities", {"Quantities"}),{"Index"})
in
    Result

PS: The "FUNDES (Functional Design)" (I like to call it that) is in the video.
 
The onset/context: Suat Ozgur's solution is exceedingly beautiful, but:
  • My first impression: very elaborate
  • Surely, this little challenge can be solved in an easier way – or else PQ has failed!🤨
    (I hadn't looked at Mr.Excel's solution at that point – see above)
Here's my first solution (of a total of 2). It was inspired by the "Excel Classic"-solution as explained in the Mr.Excel video.
It's efficient in its execution: a Result of a good 1 Million rows takes about 8.4 seconds.
FUNDES:
  • Source: get the fInput table,
  • ColumnNames: get them,
  • ConvertToLists: the payload columns look like lists, but they're actually just text strings – do what the name says,
  • PairwiseCombineColumns: pairwise combine the payload columns and expand to rows accordingly – actually not that difficult if you "transpose" the List-of-Lists using the ZIP-function,
    (remember that: "transpose (of sorts) a List-of-lists using the ZIP function". Once you realize this and the power it implies, I'm sure you're gonna use that enigmatic ZIP function more often)
  • SplitColumnByDelimiter: split the payload columns in to separate columns.
  • // buffering in the above actually slows it down by about 1 second – so, don't.
Here's the M-code:
Power Query:
let
    Source =Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = List.Skip(Table.ColumnNames(Source),1),
    ConvertToLists = Table.TransformColumns(Source, {{ColumnNames{0}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}},{ColumnNames{1}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
    PairwiseCombineColumns = Table.TransformColumns(Table.ExpandListColumn(Table.RemoveColumns(Table.AddColumn(ConvertToLists, "Combined", each List.Zip({[Products],[Quantities]})),ColumnNames), "Combined"), {{"Combined", each Text.Combine(_,"-")}}),
    SplitColumnByDelimiter = Table.SplitColumn(PairwiseCombineColumns, "Combined", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), ColumnNames)
in
    SplitColumnByDelimiter
 
Here's my second solution. It was inspired by the realization that –independently– both payload columns expand identically and therefore there is a direct relationship between those records; no need to lookup, merge, or whatever. Just do those independent expansions, take the columns you need and "slap" 'm together – done! (that's the FUNDES)
Remark: this is the only solution where I was able to make it completely independent of the actual names of the payload columns.
It's efficient in its execution: a Result of a good 1 Million rows takes about 8.0 seconds. This is my best solution.

Here's the M-code:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = Table.ColumnNames(Source),
    ExpandColumn1 = Table.ExpandListColumn(Table.TransformColumns(Source, {{ColumnNames{1}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), ColumnNames{1}),
    ExpandColumn2 = Table.ExpandListColumn(Table.TransformColumns(Source, {{ColumnNames{2}, Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), ColumnNames{2}),
    Result = Table.FromColumns({Table.Column(ExpandColumn1,ColumnNames{0}),Table.Column(ExpandColumn1,ColumnNames{1}),Table.Column(ExpandColumn2,ColumnNames{2})},ColumnNames)
in
    Result
 
Here's Henry Tam's solution as per his YT-comment and my slight completion of it.
FUNDES:
  • it's an independent variation of my ZIP solution where nested (and thus variable) local variables are used to get the inner lists required in the List-in-List technique.
  • Things are Text.Combined twice...
  • ...so I SplitByDelimiter twice, too (in a single step for added efficiency).
It's efficient in its execution: a Result of a good 1 Million rows takes about 9.0 seconds.
Here's the M-code:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ColumnNames = List.Skip(Table.ColumnNames(Source),1),
    PairwiseCombine = Table.RemoveColumns(Table.AddColumn(Source,"Combined",each let a=Text.Split([Products],","),b=Text.Split([Quantities],",") in Text.Combine(List.Transform(List.Zip({a,b}),each Text.Combine(_,"-")),"|")),ColumnNames),
    SplitByDelimiter = Table.SplitColumn(Table.ExpandListColumn(Table.TransformColumns(PairwiseCombine, {{"Combined", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Combined"), "Combined", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),ColumnNames)
in
    SplitByDelimiter
 
Here's Oyekunle SOPEJU's solution as per his YT-comment (unmodified).
FUNDES:
  • extract the record (field)values as items in a list (the outer list), keep the cardinality of the fInput table,
  • Split the Text in all these items (and thus create the inner lists), keep the cardinality of the fInput table,
  • Construct the Resulting fOutput table on a per orig. record by record basis (the first column will be incomplete by construction),
  • Combine all those mini-fOutput tables in one big one, increasing the cardinality to its final value,
  • FillDown the first column and set the data types right (initially they all had to be set to type text of this solution to work).
The beauty of this solution is:
  • You do most of the transformations where the outermost table keeps its cardinality as long as possible,
  • No need to ZIP, a simple FillDown suffices.
It's efficient in its execution: a Result of a good 1 Million rows takes about 11.0 seconds.
Here's the M-code:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ChangeType = Table.TransformColumnTypes(Source,{{"Invoice", type text}, {"Products", type text}, {"Quantities", type text}}),
    ColNames = Table.ColumnNames( ChangeType ),
    Row2list = Table.AddColumn(ChangeType, "step1", each Record.FieldValues( _ )),
    SplitText = Table.AddColumn(Row2list, "step2", each List.Transform( [step1] , each Text.Split( _ , ",") )),
    TblFromCol = Table.AddColumn(SplitText, "step3", each Table.FromColumns( [step2] , ColNames )),
    CmbnTbls = Table.Combine( TblFromCol[step3] ),
    FillDownFirstCol = Table.FillDown(CmbnTbls,{ ColNames{0} }),
    ChangeType1 = Table.TransformColumnTypes(FillDownFirstCol,{{ColNames{0}, Int64.Type}, {ColNames{1}, type text}, {ColNames{2}, Int64.Type}})
in
    ChangeType1

But I feel the actual M-code can be streamlined somewhat. Will that yield any further increased efficiency?...
 
Here's a modified version of Oyekunle SOPEJU's solution:
FUNDES:
  • eliminate intermediate steps if you don't need to branch off or re-use.
  • Don't drag along columns you don't need further down the flow.
  • I think the name of the 4th step is rather appropriate: it's rather hard to understand 'as is' – but it works!
    (and who needs readable code, right? lol)
It's efficient in its execution: a Result of a good 1 Million rows takes about 10.6 seconds. The efficiency increase is quite small.
Here's the M-code:
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="fInput"]}[Content],
    ChangeType = Table.TransformColumnTypes(Source,{{"Invoice", type text}, {"Products", type text}, {"Quantities", type text}}),
    ColNames = Table.ColumnNames(ChangeType),
    WorkTheMagic = Table.Combine(Table.TransformColumns(Table.AddColumn(ChangeType, "step", each Record.FieldValues(_))[[step]],{{"step", each Table.FromColumns(List.Transform(_, each Text.Split(_,",")),ColNames)}})[step]),
    FillDownFirstCol = Table.FillDown(WorkTheMagic,{ColNames{0}}),
    Result = Table.TransformColumnTypes(FillDownFirstCol,{{ColNames{0}, Int64.Type}, {ColNames{1}, type text}, {ColNames{2}, Int64.Type}})
in
    Result
 
Love the dialog and cool solutions : ) : ) : ) Go Team!!!
 
Hi SME's,
Trying to use this code to fix a .csv that lists everything in one column. They want it split by No, Authors, Title, Journal, Year. It's all in one cell. Have tried numerous splits using Mr. Excel's |, or by , into rows, transposing, blah blah am stumped. Am getting a token error on your "in." It looks like this, in column A, where I renamed it to "Everything." There's tons of blank rows. I can't figure this out. There's two different data types per column. It's a mess. Thanks much!
No
Authors, title, journal, year
1Person K, Person A, Preston AC, Person SA, Person S, Person D, Person E, Person H, Person I, Person K, Person L, Person M, Pharm.D, Person N, Person MB, Person B, Person O, Person P, Person Q, Person R, Person S, Person T, Person U, Person V, Person W, Person X, Person Y, Person Z. Safety and Efficacy of a Antibody against Malaria. The new england journal of medicine. 2021 Oct. 31 DOI: 11.1111/NEJMoa2206966
 
Hi SME's,
Trying to use this code to fix a .csv that lists everything in one column. They want it split by No, Authors, Title, Journal, Year. It's all in one cell. Have tried numerous splits using Mr. Excel's |, or by , into rows, transposing, blah blah am stumped. Am getting a token error on your "in." It looks like this, in column A, where I renamed it to "Everything." There's tons of blank rows. I can't figure this out. There's two different data types per column. It's a mess. Thanks much!
No
Authors, title, journal, year
1Person K, Person A, Preston AC, Person SA, Person S, Person D, Person E, Person H, Person I, Person K, Person L, Person M, Pharm.D, Person N, Person MB, Person B, Person O, Person P, Person Q, Person R, Person S, Person T, Person U, Person V, Person W, Person X, Person Y, Person Z. Safety and Efficacy of a Antibody against Malaria. The new england journal of medicine. 2021 Oct. 31 DOI: 11.1111/NEJMoa2206966
ignore the "Everything" comment. It looks like above in cell A1, if that makes sense.
 

Forum statistics

Threads
1,224,847
Messages
6,181,325
Members
453,032
Latest member
Pauh

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top