Adding New Fields After Power Query - General Theory

dcbuzzell

New Member
Joined
Jan 22, 2013
Messages
36
How do you use Power Query to aggregate and clean your data, then add new columns to the table that are updated outside PQ, without having those columns/fields and their data wiped out on a refresh.

In the mockup table below, the first four columns are the data generated/cleaned from PQ, and the status and completion date columns are unknown when the record is added to the incoming data, so added outside PQ as the information becomes known. The idea is that the list of items is compiled and then the list distributed to various users to update the status (sadly, manually) and given back for compiliation. As that is happening, more records are expected thru PQ, requiring a refresh of the queries involved.

JobTypeDescriptionVendorStatusCompletionDate
1stn1003RoughballoonH.I.Pending
1brk4007FinalroundEdwinaStarted
1shd1001RoughcircularGaleComplete15-Jan
1amt1748FinalfunnyEvelleUnknown


My actual solution has many more columns and extracts data from multiple sources and formats to get it together. There is a substantial amount of cleaning required as our antiquated system duplicates data entry on fields, often with mismatched data, etc., so its a chore getting it to a singular output that can be distributed for updating. But that's done - what I can't get my head around is how I can distribute it and get it back to incorporate the status information.

I'm sorry if this has been answered elsewhere, as I feel its a general/basic use issue, but I'll be darned if I can find the answer clearly explained somewhere. I'm basically looking for general theory on how this can be handled, not any specifics, and as I take that to develop a solution, I expect to have more specific questions related to PQ. Thanks for any assistance that you can provide.
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.
The trick is to load the result table again, in a second query. This gives you the code keep the earlier result. Then you move this code to the first query and wrap it inside a table.buffer.
Then somewhere down your clean-up code, you merge the result with this buffered query. And you extract the added columns.
It it's simplest form, imagine you have new rows in the source data.
PQ_Mr_Excel_KeepDataFromEarlierOutput.xlsx
ABCDEFGH
1
2JobTypeDescriptionVendorStatus
31shd1001RoughcircularGaleComplete
41stn1003RoughballoonH.I.Pending
51brk4007FinalroundEdwinaStarted
61amt1748FinalfunnyEvelleUnknown
7
8JobTypeDescriptionVendorStatusCompletionDate
91stn1003RoughballoonH.I.Pending
101shd1001RoughcircularGaleComplete15/01/2022
11
12
Sheet1


After the refresh of the query the result is:
PQ_Mr_Excel_KeepDataFromEarlierOutput.xlsx
ABCDEFGH
8JobTypeDescriptionVendorStatusCompletionDate
91shd1001RoughcircularGaleComplete15/01/2022
101stn1003RoughballoonH.I.Pending
111brk4007FinalroundEdwinaStarted
121amt1748FinalfunnyEvelleUnknown
13
Sheet1


Power Query:
let
    Earlier = Table.Buffer(Excel.CurrentWorkbook(){[Name="DataOut"]}[Content]),
    Source = Excel.CurrentWorkbook(){[Name="DataIn"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Job", type text}, {"Type", type text}, {"Description", type text}, {"Vendor", type text}, {"Status", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Job"}, Earlier, {"Job"}, "DataOut", JoinKind.LeftOuter),
    #"Expanded DataOut" = Table.ExpandTableColumn(#"Merged Queries", "DataOut", {"CompletionDate"}, {"CompletionDate"})
in
    #"Expanded DataOut"

PS: only considered the date column as new. Principle is the same.
 
Upvote 1
The trick is to load the result table again, in a second query. This gives you the code keep the earlier result. Then you move this code to the first query and wrap it inside a table.buffer.
Then somewhere down your clean-up code, you merge the result with this buffered query. And you extract the added columns.
It it's simplest form, imagine you have new rows in the source data.
PQ_Mr_Excel_KeepDataFromEarlierOutput.xlsx
ABCDEFGH
1
2JobTypeDescriptionVendorStatus
31shd1001RoughcircularGaleComplete
41stn1003RoughballoonH.I.Pending
51brk4007FinalroundEdwinaStarted
61amt1748FinalfunnyEvelleUnknown
7
8JobTypeDescriptionVendorStatusCompletionDate
91stn1003RoughballoonH.I.Pending
101shd1001RoughcircularGaleComplete15/01/2022
11
12
Sheet1


After the refresh of the query the result is:
PQ_Mr_Excel_KeepDataFromEarlierOutput.xlsx
ABCDEFGH
8JobTypeDescriptionVendorStatusCompletionDate
91shd1001RoughcircularGaleComplete15/01/2022
101stn1003RoughballoonH.I.Pending
111brk4007FinalroundEdwinaStarted
121amt1748FinalfunnyEvelleUnknown
13
Sheet1


Power Query:
let
    Earlier = Table.Buffer(Excel.CurrentWorkbook(){[Name="DataOut"]}[Content]),
    Source = Excel.CurrentWorkbook(){[Name="DataIn"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Job", type text}, {"Type", type text}, {"Description", type text}, {"Vendor", type text}, {"Status", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Job"}, Earlier, {"Job"}, "DataOut", JoinKind.LeftOuter),
    #"Expanded DataOut" = Table.ExpandTableColumn(#"Merged Queries", "DataOut", {"CompletionDate"}, {"CompletionDate"})
in
    #"Expanded DataOut"

PS: only considered the date column as new. Principle is the same.
Thank you, G. I especially appreciate the code on my basic example, which really helps explain it. I'll give this a try in the next week and see how it goes!
 
Upvote 0

Forum statistics

Threads
1,225,502
Messages
6,185,349
Members
453,287
Latest member
Emeister

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top