Power Query to transform data ready for pivot

kcmuppet

Active Member
Joined
Nov 2, 2005
Messages
439
Office Version
  1. 365
Platform
  1. Windows
Hello, I'm trying to set-up a power query to transform my data ready to pivot. My starting table has data in this shape/layout:


UserQ1QuestionQ1CategoryQ1AnswerQ2QuestionQ2CategoryQ2AnswerQ3QuestionQ3CategoryQ3AnswerQ4QuestionQ4CategoryQ4AnswerQ58QuestionQ58CategoryQ58Answer
User 1Age?PhysicalHighHeight?PhysicalLowShoe Size?PhysicalMediumPreferred food?PreferenceHighPreferred Colour?PreferenceHigh
User 2Age?PhysicalHighHeight?PhysicalHighShoe Size?PhysicalMediumPreferred food?PreferenceLowPreferred Colour?PreferenceLow
User 3Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalMediumPreferred food?PreferenceLowPreferred Colour?PreferenceLow
User 4Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalHighPreferred food?PreferenceLowPreferred Colour?PreferenceMedium
User 5Age?PhysicalMediumHeight?PhysicalHighShoe Size?PhysicalHighPreferred food?PreferenceMediumPreferred Colour?PreferenceMedium
User 6Age?PhysicalLowHeight?PhysicalMediumShoe Size?PhysicalLowPreferred food?PreferenceLowPreferred Colour?PreferenceMedium


...and I'm trying to get into into a format that will allow me to generate a a pivot table that will be in this format:

Final Output (Excel pivot table)
HighMediumLow
Physical
Age?231
Height?411
Shoe Size?231
Preference
Preferred food?114
Preferred Colour?132


i.e. I want to group by 'Category' and then by 'Question' and count the 'Answer' responses. I have thousands of rows and 58x Question, Question Category and Answer and the fields are not contiguous. It seems to me that I need to find a way to first stack the entries in all the fields ending 'Category', and 'Answer', but I'm struggling to know how to start?

Here is a mock-up of the starting table:

Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdmegaISgXSJWhyPvnlQDI4Iz9VITizCl2nb2pKZmkuSKwoNS21qCg1RSEtPz/FHi6SmpecirABoco5Pye/tAi7ulgdqGuNSHYtVI5S50I8TdC1IGVwxxpjdSzcxsHmXBPqOxcjkilxLNQlcPeaDpR7sUQCkU42w+pkiP+xuhduFS4Xo4cdNQI4FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q1Answer = _t, Q2Question = _t, Q2Category = _t, Q2Answer = _t, Q3Question = _t, Q3Category = _t, Q3Answer = _t, Q4Question = _t, Q4Category = _t, Q4Answer = _t, Q58Question = _t, Q58Category = _t, Q58Answer = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Q1Question", type text}, {"Q1Category", type text}, {"Q1Answer", type text}, {"Q2Question", type text}, {"Q2Category", type text}, {"Q2Answer", type text}, {"Q3Question", type text}, {"Q3Category", type text}, {"Q3Answer", type text}, {"Q4Question", type text}, {"Q4Category", type text}, {"Q4Answer", type text}, {"Q58Question", type text}, {"Q58Category", type text}, {"Q58Answer", type text}})
in
    #"Changed Type"

(cross-posted on the MS powerbi forum)
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
So... inspired by this video from howtoexcel.org, I managed to get a solution by first merging the columns for each question, and then selecting all the merged columns, unpivoting, then splitting the columns again by the delimiter.

Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdmegaISgXSJWhyPvnlQDI4Iz9VITizCl2nb2pKZmkuSKwoNS21qCg1RSEtPz/FHi6SmpecirABoco5Pye/tAi7ulgdqGuNSHYtVI5S50I8TdC1IGVwxxpjdSzcxsHmXBPqOxcjkilxLNQlcPeaDpR7sUQCkU42w+pkiP+xuhduFS4Xo4cdNQI4FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q1Answer = _t, Q2Question = _t, Q2Category = _t, Q2Answer = _t, Q3Question = _t, Q3Category = _t, Q3Answer = _t, Q4Question = _t, Q4Category = _t, Q4Answer = _t, Q58Question = _t, Q58Category = _t, Q58Answer = _t]),
       #"Merged Columns" = Table.CombineColumns(Source,{"Q1Question", "Q1Category", "Q1Answer"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Q1"),
    #"Merged Columns1" = Table.CombineColumns(#"Merged Columns",{"Q2Question", "Q2Category", "Q2Answer"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Q2"),
    #"Merged Columns2" = Table.CombineColumns(#"Merged Columns1",{"Q3Question", "Q3Category", "Q3Answer"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Q3"),
    #"Merged Columns3" = Table.CombineColumns(#"Merged Columns2",{"Q4Question", "Q4Category", "Q4Answer"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Q4"),
    #"Merged Columns4" = Table.CombineColumns(#"Merged Columns3",{"Q58Question", "Q58Category", "Q58Answer"},Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"Q58"),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Merged Columns4", {"User"}, "Attribute", "Value"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Columns", "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Value.1", "Value.2", "Value.3"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Value.1", type text}, {"Value.2", type text}, {"Value.3", type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Value.1", "Question"}, {"Value.2", "Category"}, {"Value.3", "Answer"}})
in
    #"Renamed Columns"



But, that will be very cumbersome for 60-odd columns. Does anyone know how to do that in M / Power Query so that it will merge all columns beginning with Qsomething, where Qsomething can be a single or double-digit number?
 
Upvote 0
Power Query:
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdmegaISgXSJWhyPvnlQDI4Iz9VITizCl2nb2pKZmkuSKwoNS21qCg1RSEtPz/FHi6SmpecirABoco5Pye/tAi7ulgdqGuNSHYtVI5S50I8TdC1IGVwxxpjdSzcxsHmXBPqOxcjkilxLNQlcPeaDpR7sUQCkU42w+pkiP+xuhduFS4Xo4cdNQI4FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q1Answer = _t, Q2Question = _t, Q2Category = _t, Q2Answer = _t, Q3Question = _t, Q3Category = _t, Q3Answer = _t, Q4Question = _t, Q4Category = _t, Q4Answer = _t, Q58Question = _t, Q58Category = _t, Q58Answer = _t]),
    tcn = Table.ColumnNames(Source),
    tcnQ = List.Select(tcn, each Text.Start(_,1)="Q"),
    tcngroup = List.Accumulate({0..List.Count(tcnQ)/3 -1 }, {}, (s,c)=> s & {List.Range(tcnQ, c*3, 3)}),
    tbl = List.Accumulate(tcngroup, Source, (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
    tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
    tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Question", "Category", "Answer"}),
    digits = List.Transform({0..9}, Text.From),
    Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))})
in
    Result
 
Upvote 1
As far as I can work out...
Power Query:
  tcn = Table.ColumnNames(Source),
...lists the column names
Power Query:
      tcnQ = List.Select(tcn, each Text.Start(_,1)="Q"),
...selects columns beginning with Q
Power Query:
      tcngroup = List.Accumulate({0..List.Count(tcnQ)/3 -1 }, {}, (s,c)=> s & {List.Range(tcnQ, c*3, 3)}),
... not sure, but seems to break column names into groups of 3?
Power Query:
      tbl = List.Accumulate(tcngroup, Source, (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
... combines the groups with semicolon delimeter
Power Query:
      tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
...unpivots columns other than 'User'
Power Query:
      tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Question", "Category", "Answer"}),
...re-splits the combined columns at each semicolon delimeter
Power Query:
      digits = List.Transform({0..9}, Text.From),
...not sure?
Power Query:
      Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))})
...removes "Question No."from each question number

The issue I have applying it to my real data is that the Table.ColumnNames step seems to list only the first 50 column names, whereas in my real data set I have over 230.

Is there a way round this?
 
Upvote 0
Actually I realised it's not failing because of the Table.ColumnNames step. It's because of the column order in my real data

It actually goes Q1Question, Q1Category, Q1Question, Q1Category etc. and the after all the Question and Category come all the answers, Q1Answer, Q2Answer, etc.. so it's the step after grouping that doesn't work.

After:
Power Query:
tcnQ = List.Select(tcn, each Text.Start(_,1)="Q"),
...I tried adding:
Power Query:
sortqs=List.Sort(tcnQ,Order.Ascending),
, but this didn't work because the original column names have single and double digits, so Q10Something came before Q1Something and so on
 
Upvote 0
If your Q columns are in groups of three then sorting as you did should result in the correct grouping in tcngroup, but they will be in alphabetical order as you said. That's ok, just sort based on Question No. after the Result step to put the table into the correct order
 
Upvote 0
I
... just sort based on Question No. after the Result step to put the table into the correct order
It doesn't get that far because it groups likes this:

Q1Question, Q1Category, Q2Question
Q2Category, Q3Question, Q3Category
Q4Question, Q4Category,Q58Question
Q58Category,Q1Answer,Q2Answer
Q3Answer,Q4Answer,Q58Answer
 
Upvote 0
you didn't do the sort.

Sort tcnQ first

Power Query:
tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),
 
Upvote 0
This:
Power Query:
tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),

...doesn't seem to work because the lack of leading zeroes in the original column names in my actual data set i.e. Q10Question comes before Q1Question, etc.

(I've been trying work out if I can convert the list of column names into a table, add the leading zeros after the Q, then sort, but I'm getting stuck trying to turn it back into a list for the tcngroup to work with, so probably I'm barking up the wrong tree...)
 
Upvote 0

Forum statistics

Threads
1,224,824
Messages
6,181,186
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top