Power Query to transform data ready for pivot

kcmuppet · Mar 1, 2023

Hello, I'm trying to set-up a power query to transform my data ready to pivot. My starting table has data in this shape/layout:



User	Q1Question	Q1Category	Q1Answer	Q2Question	Q2Category	Q2Answer	Q3Question	Q3Category	Q3Answer	Q4Question	Q4Category	Q4Answer	Q58Question	Q58Category	Q58Answer
User 1	Age?	Physical	High	Height?	Physical	Low	Shoe Size?	Physical	Medium	Preferred food?	Preference	High	Preferred Colour?	Preference	High
User 2	Age?	Physical	High	Height?	Physical	High	Shoe Size?	Physical	Medium	Preferred food?	Preference	Low	Preferred Colour?	Preference	Low
User 3	Age?	Physical	Medium	Height?	Physical	High	Shoe Size?	Physical	Medium	Preferred food?	Preference	Low	Preferred Colour?	Preference	Low
User 4	Age?	Physical	Medium	Height?	Physical	High	Shoe Size?	Physical	High	Preferred food?	Preference	Low	Preferred Colour?	Preference	Medium
User 5	Age?	Physical	Medium	Height?	Physical	High	Shoe Size?	Physical	High	Preferred food?	Preference	Medium	Preferred Colour?	Preference	Medium
User 6	Age?	Physical	Low	Height?	Physical	Medium	Shoe Size?	Physical	Low	Preferred food?	Preference	Low	Preferred Colour?	Preference	Medium

...and I'm trying to get into into a format that will allow me to generate a a pivot table that will be in this format:



Final Output (Excel pivot table)
	High	Medium	Low
Physical
Age?	2	3	1
Height?	4	1	1
Shoe Size?	2	3	1
Preference
Preferred food?	1	1	4
Preferred Colour?	1	3	2

i.e. I want to group by 'Category' and then by 'Question' and count the 'Answer' responses. I have thousands of rows and 58x Question, Question Category and Answer and the fields are not contiguous. It seems to me that I need to find a way to first stack the entries in all the fields ending 'Category', and 'Answer', but I'm struggling to know how to start?

Here is a mock-up of the starting table:

Power Query:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdmegaISgXSJWhyPvnlQDI4Iz9VITizCl2nb2pKZmkuSKwoNS21qCg1RSEtPz/FHi6SmpecirABoco5Pye/tAi7ulgdqGuNSHYtVI5S50I8TdC1IGVwxxpjdSzcxsHmXBPqOxcjkilxLNQlcPeaDpR7sUQCkU42w+pkiP+xuhduFS4Xo4cdNQI4FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q1Answer = _t, Q2Question = _t, Q2Category = _t, Q2Answer = _t, Q3Question = _t, Q3Category = _t, Q3Answer = _t, Q4Question = _t, Q4Category = _t, Q4Answer = _t, Q58Question = _t, Q58Category = _t, Q58Answer = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"User", type text}, {"Q1Question", type text}, {"Q1Category", type text}, {"Q1Answer", type text}, {"Q2Question", type text}, {"Q2Category", type text}, {"Q2Answer", type text}, {"Q3Question", type text}, {"Q3Category", type text}, {"Q3Answer", type text}, {"Q4Question", type text}, {"Q4Category", type text}, {"Q4Answer", type text}, {"Q58Question", type text}, {"Q58Category", type text}, {"Q58Answer", type text}})
in
    #"Changed Type"

(cross-posted on the MS powerbi forum)

JGordon11 · Mar 3, 2023

But doesn't the grouping still come out correct?

{
{Q10Question, Q10Category, Q10Answer},
{Q1Question, Q1Category, Q1Answer}
...}

if so you can sort the table at then end by the Question No. column to get the order corrected

kcmuppet · Mar 3, 2023

JGordon11 said:
But doesn't the grouping still come out correct?

{
{Q10Question, Q10Category, Q10Answer},
{Q1Question, Q1Category, Q1Answer}
...}

if so you can sort the table at then end by the Question No. column to get the order corrected

No, unfortunately it doesn't. It goes:

Q1Question,Q1Category,Q3Question
Q3Question,Q3Category,Q4Question
Q4Question,Q4Category,Q5Question
Q5Question,Q5Category,Q6Question
Q6Question,Q6Category,Q7Question
Q7Question,Q7Category,Q8Question
Q8Question,Q8Category,Q9Question
Q9Question,Q9Category,Q10Question
Q10Question,Q10Category,Q11Question
Q11Question,Q11Category,Q1Answer
Q3Answer,Q4Answer,Q5Answer
Q4Answer,Q5Answer,Q6Answer
Q5Answer,Q6Answer,Q7Answer
Q6Answer,Q7Answer,Q8Answer
Q7Answer,Q8Answer,Q9Answer
Q8Answer,Q9Answer,Q10Answer
Q9Answer,Q10Answer,Q11Answer

etc. to Q58 because the actual data layout wasn't exactly as in the original post. It's more like this, except with 58 Question types and 2 answer types:

Power Query:

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdqZnpGCZpgcEZ+qkJwZhW64oCi1LTUoqLUFIW0/PwUe7hIal5yKoq0c35OfmkRhgIPoF1Ayie/HEj6pqZkluYiRFEkUcVQlcbqQH1kNGh8hO5OiCeweQXN7yAu3D/GA+8f9EjB5yM0Oex+Mhl8fsKbCtE8hSoK95XpoPUVXBinx3BIwP1mNvB+w5KgMASxyqKoiY0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q2Question = _t, Q2Category = _t, Q3Question = _t, Q3Category = _t, Q4Question = _t, Q4Category = _t, Q58Question = _t, Q58Category = _t, Q1ORAnswer = _t, Q2ORAnswer = _t, Q3ORAnswer = _t, Q4ORAnswer = _t, Q58ORAnswer = _t, Q1CRAnswer = _t, Q2CRAnswer = _t, Q3CRAnswer = _t, Q4CRAnswer = _t, Q58CRAnswer = _t]),

JGordon11 · Mar 3, 2023

Did you change the tcnQ step to sort as suggested in post #9?

Power Query:

tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),

if you did there is no way the order is coming out like

Q1Question, Q1Category, Q2Question
Q2Category, Q3Question, Q3Category
Q4Question, Q4Category,Q58Question
Q58Category,Q1Answer,Q2Answer
Q3Answer,Q4Answer,Q58Answer

kcmuppet · Mar 3, 2023

JGordon11 said:
Did you change the tcnQ step to sort as suggested in post #9?

Power Query:

tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),

You're right, I made a mistake with that line! (I simply put List.Select(tcn, each Text.Start(_,1)="Q")),

Thank you for your patience.

The working version is like this

Power Query:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi1OLVIwVNJRckxPtQdSARmVxZnJiTlApkdqZnpGCZpgcEZ+qkJwZhW64oCi1LTUoqLUFIW0/PwUe7hIal5yKoq0c35OfmkRhgIPoF1Ayie/HEj6pqZkluYiRFEkUcVQlcbqQH1kNGh8hO5OiCeweQXN7yAu3D/GA+8f9EjB5yM0Oex+Mhl8fsKbCtE8hSoK95XpoPUVXBinx3BIwP1mNvB+w5KgMASxyqKoiY0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [User = _t, Q1Question = _t, Q1Category = _t, Q2Question = _t, Q2Category = _t, Q3Question = _t, Q3Category = _t, Q4Question = _t, Q4Category = _t, Q58Question = _t, Q58Category = _t, Q1ORAnswer = _t, Q2ORAnswer = _t, Q3ORAnswer = _t, Q4ORAnswer = _t, Q58ORAnswer = _t, Q1CRAnswer = _t, Q2CRAnswer = _t, Q3CRAnswer = _t, Q4CRAnswer = _t, Q58CRAnswer = _t]),
    #"Renamed Columns" = Table.RenameColumns(Source,{{"Q4Question", "Q10Question"}, {"Q4Category", "Q10Category"}, {"Q4ORAnswer", "Q10ORAnswer"}, {"Q4CRAnswer", "Q10CRAnswer"}}),
    tcn = Table.ColumnNames(#"Renamed Columns"),
    tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),
    tcngroup = List.Accumulate({0..List.Count(tcnQ)/4 -1 }, {}, (s,c)=> s & {List.Range(tcnQ, c*4, 4)}),
    tbl = List.Accumulate(tcngroup, #"Renamed Columns", (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
    tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
    tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"CRAnswer", "Category", "ORAnswer","Question"}),
    digits = List.Transform({0..9}, Text.From),
    Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))}),
    #"Reordered Columns" = Table.ReorderColumns(Result,{"User", "Question No.", "Category", "Question", "CRAnswer", "ORAnswer"})
in
    #"Reordered Columns"

kcmuppet · Mar 10, 2023

@JGordon11 is there a way to use List.Buffer to speed it up? (with my actual data it's taking 90 minutes to run the query on a table with 2,600 rows). I've been reading around on List.Buffer but can't work out where to put it in your M code.

JGordon11 · Mar 10, 2023

I extended your sample data to 5000 rows and this code runs in a few seconds. Not sure how slow it will be on your real data that has a lot more columns. I buffered the source table (i.e. buffer whatever table you reference in the tbl step) and the tcngroup list. I changed the tcngroup step to a more efficient function (List.Split).

Power Query:

let
    Source = Table.Buffer(Excel.CurrentWorkbook(){[Name = "Table1"]}[Content]),
    tcn = Table.ColumnNames(Source),
    tcnQ = List.Sort(List.Select(tcn, each Text.Start(_,1)="Q")),
    tcngroup = List.Buffer(List.Split(tcnQ,4)),
    tbl = List.Accumulate(tcngroup, Source, (s,c)=> Table.CombineColumns(s, c, Combiner.CombineTextByDelimiter(";", QuoteStyle.None), c{0})),
    tbl1 = Table.UnpivotOtherColumns(tbl, {"User"}, "Question No.", "Value"),
    tbl2 = Table.SplitColumn(tbl1, "Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"CRAnswer", "Category", "ORAnswer","Question"}),
    digits = List.Transform({0..9}, Text.From),
    Result = Table.TransformColumns(tbl2, {"Question No.", each Number.From(Text.Select(_, digits))}),
    #"Reordered Columns" = Table.ReorderColumns(Result,{"User", "Question No.", "Category", "Question", "CRAnswer", "ORAnswer"})
in
    #"Reordered Columns"

jaeiow · Mar 13, 2023

I don't think you would buffer these sources as they are Excel - they don't query fold.

kcmuppet · Mar 13, 2023

JGordon11 said:
I ... I buffered the source table ... a more efficient function ....

Thanks, I'll try it when I get back next week.

kcmuppet · Mar 13, 2023

jaeiow said:
I don't think you would buffer these sources as they are Excel - they don't query fold.

Not sure what that means. The original source is a CSV file, if that makes a difference?

kcmuppet · Mar 20, 2023

JGordon11 said:
I extended your sample data to 5000 rows and this code runs in a few seconds. Not sure how slow it will be on your real data that has a lot more columns. I buffered the source table (i.e. buffer whatever table you reference in the tbl step) and the tcngroup list. I changed the tcngroup step to a more efficient function (List.Split).

Thanks - this approach reduced the refresh time with my data set from 90 minutes to 30 minutes (2,600 rows, 400 columns)

Power Query to transform data ready for pivot

kcmuppet

Active Member

JGordon11

Well-known Member

Excel Facts

kcmuppet

Active Member

JGordon11

Well-known Member

kcmuppet

Active Member

kcmuppet

Active Member

JGordon11

Well-known Member

jaeiow

Board Regular

kcmuppet

Active Member

kcmuppet

Active Member

kcmuppet

Active Member

Similar threads

Forum statistics

Share this page

Power Query to transform data ready for pivot

Active Member

Well-known Member

Excel Facts

Active Member

Well-known Member

Active Member

Active Member

Well-known Member

Board Regular

Active Member

Active Member

Active Member

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock