[Power query] Transpose uneven data and combine multiple occurences by delimiter

Dutsj

New Member
Joined
Jan 10, 2022
Messages
14
Office Version
  1. 365
Platform
  1. Windows
So i got some rough data from an SQL, it is in JSON and it's poorly formatted. I've somehow managed to format it almost how i want it, now i'm just limited by my skills and i can't seem to find anyone with a similar issue. I've tried several things, split by list, group, transposing you name it, i just can't seem to get it right.

I've visualized some sample data because i think that explains it better. Basically i want to transpose the first columns rows into columns, and add the values. Everytime it gets to a new column 1, that means it's a new row. It's not identical, sometimes column4name will be missing, sometimes there's only 5 columnnames, other times there's duplicate columnnames that needs to be combined with a delimiter (comma and space ", "). The author and link columns are identical until the next column1, so they only need to be added once as a new column after the transposed columns.
1641824847504.png

Basically i want this data to be transformed into the table below it if possible.
ColumnNamesValuesAuthor (same value until next column1)Link (same value until next column1)
Column1Name (column1 will always be start of a new row)
1​
Author1http:/Link1.com
Column2Name
2​
Author1http:/Link1.com
Column3Name
3​
Author1http:/Link1.com
Column3Name (There can be duplicate column names per row, join by comma)
4​
Author1http:/Link1.com
Column5Name
5​
Author1http:/Link1.com
Column1Name (column1 will always be start of a new row)aAuthor2http:/Link2.com
Column2NamebAuthor2http:/Link2.com
Column3NamecAuthor2http:/Link2.com
Column4NamedAuthor2http:/Link2.com
Column5NameeAuthor2http:/Link2.com
Column6NamefAuthor2http:/Link2.com
Column7NamegAuthor2http:/Link2.com
Column7NamehAuthor2http:/Link2.com
Column7Name (There can be duplicate column names per row, join by comma)iAuthor2http:/Link2.com
Column1Name (column1 will always be start of a new row)
101​
Author3http:/Link3.com
Column2Name
102​
Author3http:/Link3.com
Column3Name
103​
Author3http:/Link3.com
Column3Name (There can be duplicate column names per row, join by comma)
104​
Author3http:/Link3.com
Column5Name
105​
Author3http:/Link3.com
Column6Name
106​
Author3http:/Link3.com
Column7Name
107​
Author3http:/Link3.com
Column8Name
108​
Author3http:/Link3.com
So from the above table to this table below
Column1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8NameAuthorLink
1​
2​
3, 4null
5​
nullnullnullAuthor1http:/Link1.com
abcdefg, h, inullAuthor2http:/Link2.com
101​
102​
103, 104null
105​
106​
107​
108​
Author3http:/Link3.com
 

Attachments

  • 1641824816509.png
    1641824816509.png
    64.6 KB · Views: 34

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
Can be accomplished with Power Pivot. Need to create a measure for the values
Measure called "Values" and has DAX formula =CONCATENATEX(Range,Range[Values],", ")

Here is what the PT looks like and the layout

Book13
ABCDEFGHIJ
3LinksColumnNames
4AuthorLinkColumn1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8Name
5Author1http:/Link1.com123, 45
6Author2http:/Link2.comabcdefg, h, i
7Author3http:/Link3.com101102103, 104105106107108
Sheet2
 

Attachments

  • PT!jpg.jpg
    PT!jpg.jpg
    40.9 KB · Views: 22
Upvote 0
In power query:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], 
    tbl = Table.AddColumn(Source, "Mult", each try let x = Table.PositionOf(Source,_) in if _[ColumnNames] = Source[ColumnNames]{x+1} then true else false otherwise false),
    tbl1 = Table.AddColumn( tbl, "Count", each if [Mult] then let x = Table.PositionOf(tbl,_) in 
         List.Sum(try List.Generate(()=>x, (y)=> tbl[Mult]{y} = true, (y)=>y+1, (y)=>1) otherwise {1}) else 0 ),
    tbl2 = Table.AddColumn( tbl1, "Values1", each if [Mult] then let x = Table.PositionOf(tbl1,_) in 
        List.Range(tbl1[Values],x, [Count]+1) else {[Values]}),
    tbl3 = Table.AddColumn( tbl2, "Keep", each let x = Table.PositionOf(tbl2,_) in
        try if tbl2[Mult]{x-1} then false else true otherwise true),
    tbl4 = Table.SelectRows(tbl3, each [Keep]),
    tbl5 = Table.RemoveColumns(tbl4,{"Values", "Mult", "Count", "Keep"}),
    tbl6 = Table.ReorderColumns(tbl5,{"ColumnNames", "Values1", "Author", "Link"}),
    tbl7 = Table.RenameColumns(tbl6,{{"Values1", "Values"}}),
    tbl8 = Table.TransformColumns(tbl7, {"Values", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
    tbl9 = Table.Pivot(tbl8, List.Distinct(tbl8[ColumnNames]), "ColumnNames", "Values"),
    Result = let tcn = Table.ColumnNames(tbl9) in Table.ReorderColumns(tbl9,List.Sort(List.Skip(tcn,2)) & List.FirstN(tcn,2))
in
    Result

Book2
ABCDEFGHIJKLMNOP
1ColumnNamesValuesAuthorLinkColumn1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8NameAuthorLink
2Column1Name1Author1http:/Link1.com123, 45Author1http:/Link1.com
3Column2Name2Author1http:/Link1.comabcdefg, h, iAuthor2http:/Link2.com
4Column3Name3Author1http:/Link1.com101102103, 104105106107108Author3http:/Link3.com
5Column3Name4Author1http:/Link1.com
6Column5Name5Author1http:/Link1.com
7Column1NameaAuthor2http:/Link2.com
8Column2NamebAuthor2http:/Link2.com
9Column3NamecAuthor2http:/Link2.com
10Column4NamedAuthor2http:/Link2.com
11Column5NameeAuthor2http:/Link2.com
12Column6NamefAuthor2http:/Link2.com
13Column7NamegAuthor2http:/Link2.com
14Column7NamehAuthor2http:/Link2.com
15Column7NameiAuthor2http:/Link2.com
16Column1Name101Author3http:/Link3.com
17Column2Name102Author3http:/Link3.com
18Column3Name103Author3http:/Link3.com
19Column3Name104Author3http:/Link3.com
20Column5Name105Author3http:/Link3.com
21Column6Name106Author3http:/Link3.com
22Column7Name107Author3http:/Link3.com
23Column8Name108Author3http:/Link3.com
24
Sheet1
 
Upvote 0
Solution
In power query:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tbl = Table.AddColumn(Source, "Mult", each try let x = Table.PositionOf(Source,_) in if _[ColumnNames] = Source[ColumnNames]{x+1} then true else false otherwise false),
    tbl1 = Table.AddColumn( tbl, "Count", each if [Mult] then let x = Table.PositionOf(tbl,_) in
         List.Sum(try List.Generate(()=>x, (y)=> tbl[Mult]{y} = true, (y)=>y+1, (y)=>1) otherwise {1}) else 0 ),
    tbl2 = Table.AddColumn( tbl1, "Values1", each if [Mult] then let x = Table.PositionOf(tbl1,_) in
        List.Range(tbl1[Values],x, [Count]+1) else {[Values]}),
    tbl3 = Table.AddColumn( tbl2, "Keep", each let x = Table.PositionOf(tbl2,_) in
        try if tbl2[Mult]{x-1} then false else true otherwise true),
    tbl4 = Table.SelectRows(tbl3, each [Keep]),
    tbl5 = Table.RemoveColumns(tbl4,{"Values", "Mult", "Count", "Keep"}),
    tbl6 = Table.ReorderColumns(tbl5,{"ColumnNames", "Values1", "Author", "Link"}),
    tbl7 = Table.RenameColumns(tbl6,{{"Values1", "Values"}}),
    tbl8 = Table.TransformColumns(tbl7, {"Values", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
    tbl9 = Table.Pivot(tbl8, List.Distinct(tbl8[ColumnNames]), "ColumnNames", "Values"),
    Result = let tcn = Table.ColumnNames(tbl9) in Table.ReorderColumns(tbl9,List.Sort(List.Skip(tcn,2)) & List.FirstN(tcn,2))
in
    Result

Book2
ABCDEFGHIJKLMNOP
1ColumnNamesValuesAuthorLinkColumn1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8NameAuthorLink
2Column1Name1Author1http:/Link1.com123, 45Author1http:/Link1.com
3Column2Name2Author1http:/Link1.comabcdefg, h, iAuthor2http:/Link2.com
4Column3Name3Author1http:/Link1.com101102103, 104105106107108Author3http:/Link3.com
5Column3Name4Author1http:/Link1.com
6Column5Name5Author1http:/Link1.com
7Column1NameaAuthor2http:/Link2.com
8Column2NamebAuthor2http:/Link2.com
9Column3NamecAuthor2http:/Link2.com
10Column4NamedAuthor2http:/Link2.com
11Column5NameeAuthor2http:/Link2.com
12Column6NamefAuthor2http:/Link2.com
13Column7NamegAuthor2http:/Link2.com
14Column7NamehAuthor2http:/Link2.com
15Column7NameiAuthor2http:/Link2.com
16Column1Name101Author3http:/Link3.com
17Column2Name102Author3http:/Link3.com
18Column3Name103Author3http:/Link3.com
19Column3Name104Author3http:/Link3.com
20Column5Name105Author3http:/Link3.com
21Column6Name106Author3http:/Link3.com
22Column7Name107Author3http:/Link3.com
23Column8Name108Author3http:/Link3.com
24
Sheet1
Amazing, thank you so much! That first function is genius, i would've never thought of that.

If anyone has issues implementing this, it's probably because you have some steps before implementing the lines, i forgot to replace all the instances of "Source" in the tbl line to the name of the previous step, took me a while to figure out why my sample data was working but not my actual data... ?
 
Upvote 0
Can be accomplished with Power Pivot. Need to create a measure for the values
Measure called "Values" and has DAX formula =CONCATENATEX(Range,Range[Values],", ")

Here is what the PT looks like and the layout

Book13
ABCDEFGHIJ
3LinksColumnNames
4AuthorLinkColumn1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8Name
5Author1http:/Link1.com123, 45
6Author2http:/Link2.comabcdefg, h, i
7Author3http:/Link3.com101102103, 104105106107108
Sheet2
Thanks for the solution, I should've mentioned that i'm using Power BI, but it's always interesting to see what you can do in other tools, works like a charm ?
 
Upvote 0
In power query:

Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    tbl = Table.AddColumn(Source, "Mult", each try let x = Table.PositionOf(Source,_) in if _[ColumnNames] = Source[ColumnNames]{x+1} then true else false otherwise false),
    tbl1 = Table.AddColumn( tbl, "Count", each if [Mult] then let x = Table.PositionOf(tbl,_) in
         List.Sum(try List.Generate(()=>x, (y)=> tbl[Mult]{y} = true, (y)=>y+1, (y)=>1) otherwise {1}) else 0 ),
    tbl2 = Table.AddColumn( tbl1, "Values1", each if [Mult] then let x = Table.PositionOf(tbl1,_) in
        List.Range(tbl1[Values],x, [Count]+1) else {[Values]}),
    tbl3 = Table.AddColumn( tbl2, "Keep", each let x = Table.PositionOf(tbl2,_) in
        try if tbl2[Mult]{x-1} then false else true otherwise true),
    tbl4 = Table.SelectRows(tbl3, each [Keep]),
    tbl5 = Table.RemoveColumns(tbl4,{"Values", "Mult", "Count", "Keep"}),
    tbl6 = Table.ReorderColumns(tbl5,{"ColumnNames", "Values1", "Author", "Link"}),
    tbl7 = Table.RenameColumns(tbl6,{{"Values1", "Values"}}),
    tbl8 = Table.TransformColumns(tbl7, {"Values", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
    tbl9 = Table.Pivot(tbl8, List.Distinct(tbl8[ColumnNames]), "ColumnNames", "Values"),
    Result = let tcn = Table.ColumnNames(tbl9) in Table.ReorderColumns(tbl9,List.Sort(List.Skip(tcn,2)) & List.FirstN(tcn,2))
in
    Result

Book2
ABCDEFGHIJKLMNOP
1ColumnNamesValuesAuthorLinkColumn1NameColumn2NameColumn3NameColumn4NameColumn5NameColumn6NameColumn7NameColumn8NameAuthorLink
2Column1Name1Author1http:/Link1.com123, 45Author1http:/Link1.com
3Column2Name2Author1http:/Link1.comabcdefg, h, iAuthor2http:/Link2.com
4Column3Name3Author1http:/Link1.com101102103, 104105106107108Author3http:/Link3.com
5Column3Name4Author1http:/Link1.com
6Column5Name5Author1http:/Link1.com
7Column1NameaAuthor2http:/Link2.com
8Column2NamebAuthor2http:/Link2.com
9Column3NamecAuthor2http:/Link2.com
10Column4NamedAuthor2http:/Link2.com
11Column5NameeAuthor2http:/Link2.com
12Column6NamefAuthor2http:/Link2.com
13Column7NamegAuthor2http:/Link2.com
14Column7NamehAuthor2http:/Link2.com
15Column7NameiAuthor2http:/Link2.com
16Column1Name101Author3http:/Link3.com
17Column2Name102Author3http:/Link3.com
18Column3Name103Author3http:/Link3.com
19Column3Name104Author3http:/Link3.com
20Column5Name105Author3http:/Link3.com
21Column6Name106Author3http:/Link3.com
22Column7Name107Author3http:/Link3.com
23Column8Name108Author3http:/Link3.com
24
Sheet1
I'm afraid i'm back again, it worked perfectly for all of my datasets in the beginning, but once i got to the main dataset wich is 3000 lines it freezes and stops after loading for 10 minutes at tbl9. Any idea how you could handle this with a more lightweight solution? Or any settings in power BI i can tinker with to make it work? As far as i can tell the data is identical to the smaller sheets, except for some fields having a large amount of text in them, could this affect the results? I can't upload the data here I'm afraid, but i was just wondering if you had some quick idea as to why it freezes?
 
Upvote 0
So if I just add about three thousand lines to your sample table, will that mimic your main data set? About how many column names should you end up with?
 
Upvote 0
So if I just add about three thousand lines to your sample table, will that mimic your main data set? About how many column names should you end up with?
This dataset has around 12 different column names, you can try to mimic the dataset and see if it helps, maybe add some text fields with long text to 1-2 fields per 12 column names. But if you don't have any obvious answer i'll try tinkering a bit more myself, i've just tried for hours now without any results, so if you can't see any obvious reason as to why it fails, i'll try to make some accurate test data tomorrow. I also just saw that some fields were empty, maybe that messes it up, i'll keep you posted :)
 
Upvote 0
So if I just add about three thousand lines to your sample table, will that mimic your main data set? About how many column names should you end up with?
So i just tried reducing the amount of data, it worked when i run around 100 lines with this dataset although it's very slow still, (tbl9->result took around 8 minutes) i assume it's due to it having very large amounts of data in each cell (several lines). I don't have time right now to make sample data, i might have time monday, but i just thought i should let you know that it works with around 100 lines, but if i add anymore than that and it'll exceed 10 minutes and fail, so i probably need to find a more lightweight solution somehow.
 
Upvote 0
The query becomes much simpler and I would assume much faster if you add four helper columns to the original table as in the example below:

Book3
ABCDEFGHIJKLMNOPQRST
1ColumnNamesValuesAuthorLinkMultipleNumberJoinedKeepColumn1NameColumn2NameColumn3NameColumn5NameColumn4NameColumn6NameColumn7NameColumn8NameAuthorLink
2Column1Name1Author1http:/Link1.comFALSE01TRUE123, 45Author1http:/Link1.com
3Column2Name2Author1http:/Link1.comFALSE02TRUEabcedfg, h, iAuthor2http:/Link2.com
4Column3Name3Author1http:/Link1.comTRUE13, 4TRUE101102103, 104105106107108Author3http:/Link3.com
5Column3Name4Author1http:/Link1.comFALSE04FALSE
6Column5Name5Author1http:/Link1.comFALSE05TRUE
7Column1NameaAuthor2http:/Link2.comFALSE0aTRUE
8Column2NamebAuthor2http:/Link2.comFALSE0bTRUE
9Column3NamecAuthor2http:/Link2.comFALSE0cTRUE
10Column4NamedAuthor2http:/Link2.comFALSE0dTRUE
11Column5NameeAuthor2http:/Link2.comFALSE0eTRUE
12Column6NamefAuthor2http:/Link2.comFALSE0fTRUE
13Column7NamegAuthor2http:/Link2.comTRUE2g, h, iTRUE
14Column7NamehAuthor2http:/Link2.comTRUE1h, iFALSE
15Column7NameiAuthor2http:/Link2.comFALSE0iFALSE
16Column1Name101Author3http:/Link3.comFALSE0101TRUE
17Column2Name102Author3http:/Link3.comFALSE0102TRUE
18Column3Name103Author3http:/Link3.comTRUE1103, 104TRUE
19Column3Name104Author3http:/Link3.comFALSE0104FALSE
20Column5Name105Author3http:/Link3.comFALSE0105TRUE
21Column6Name106Author3http:/Link3.comFALSE0106TRUE
22Column7Name107Author3http:/Link3.comFALSE0107TRUE
23Column8Name108Author3http:/Link3.comFALSE0108TRUE
Sheet1
Cell Formulas
RangeFormula
E2:E23E2=A2=A3
F2:F22F2=IF(E2,MATCH(FALSE,E3:E$23,0),0)
G2:G23G2=TEXTJOIN(", ",1,OFFSET(B2,0,0,F2+1))
H3:H23H3=IF(F2,FALSE,TRUE)
F23F23=IF(E23,MATCH(FALSE,E$23:E24,0),0)


Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    FilterKeep = Table.SelectRows(Source, each [Keep]),
    RemoveCols = Table.RemoveColumns(FilterKeep,{"Values", "Multiple", "Number","Keep"}),
    Pivot = Table.Pivot(RemoveCols, List.Distinct(RemoveCols[ColumnNames]), "ColumnNames", "Joined"),
    lstReorder = let tcn = Table.ColumnNames(Pivot) in List.Skip(tcn,2) & List.FirstN(tcn,2),
    ReorderCols = Table.ReorderColumns(Pivot,lstReorder)
in
    ReorderCols
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top