apply column headers to another table in Power Query

LittleJenny

New Member
Joined
Jan 19, 2023
Messages
1
Office Version
  1. 2021
Platform
  1. Windows
I am trying to import data from a PDF file which has 7 pages of transcation data. so in power query all the data were splitted to 7 tables. But the column headers were only in the first table as what was in the PDF. Now I want to combine all the tables. I did research and understand I need to make all the other tables have same column headers as table 1 so that Power Query can identify and append accordingly. so my question is - instead of manually type in header names one by one, how can i quickly copy and paste table 1 column headers to all other tables?
1674185041762.png

1674185078313.png
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
Instead of selecting one table at a time, select the entire file:
1674226951560.png

All of the pages and tables will be pulled in and look much like it does when you use Get Data From Folder. From there you can easily filter out what you don't need like pages.

I use PDFs a lot. Be aware that they have a tendency to "morph" over time depending on where they come from. Mine are printed using a Chrome browser, and have gone through several iterations of being tweaked due to the PDFs changing, but they've always been able to be fixed and now almost never need adjusting. I've had to use a calculated column to tell me how many rows each table had in the file and then filter out any small tables, and then Transform to flip the table and then a custom formula to merge none, 1, 2, 3, and even 4 columns to get the right headers and then Transform back, and add THEN promote headers.

If the PDF tables are as carefully laid out as they appear, you shouldn't have any problems.

BTW, assuming you're not doing a simple Append where the column headers wouldn't be an issue, you can set up a manual Query to pull the Column Names into a list for the one table that has column headers, and then use that to make the column headers for the other columns. There are those much better with PQ than I am that could help with that if needed!
 
Upvote 0
Solution
This can work
Power Query:
 = Table.RenameColumns(#"Previous Step", List.Zip({Table.ColumnNames(#"Previous Step"), Table.ColumnNames(#"Table003 (Page 1)")}))

But tips of @jdellasala to read full document of to append the tables are the way to go.
 
Upvote 0
Instead of selecting one table at a time, select the entire file:
View attachment 83302
All of the pages and tables will be pulled in and look much like it does when you use Get Data From Folder. From there you can easily filter out what you don't need like pages.

I use PDFs a lot. Be aware that they have a tendency to "morph" over time depending on where they come from. Mine are printed using a Chrome browser, and have gone through several iterations of being tweaked due to the PDFs changing, but they've always been able to be fixed and now almost never need adjusting. I've had to use a calculated column to tell me how many rows each table had in the file and then filter out any small tables, and then Transform to flip the table and then a custom formula to merge none, 1, 2, 3, and even 4 columns to get the right headers and then Transform back, and add THEN promote headers.

If the PDF tables are as carefully laid out as they appear, you shouldn't have any problems.

BTW, assuming you're not doing a simple Append where the column headers wouldn't be an issue, you can set up a manual Query to pull the Column Names into a list for the one table that has column headers, and then use that to make the column headers for the other columns. There are those much better with PQ than I am that could help with that if needed!
@jdellasala .. I read your reply to this post mentioning your experiences in converting PDF. I'm facing the same challenges you had before. I wonder how you were able to get the "PDF files fixed and never need adjusting." I wish I can find a way to instruct the person who prepare the PDF file to have a cleaner PDF file so that I don't need to find different functions to transform dynamically.
Thanks.
 
Upvote 0
@jdellasala .. I read your reply to this post mentioning your experiences in converting PDF. I'm facing the same challenges you had before. I wonder how you were able to get the "PDF files fixed and never need adjusting." I wish I can find a way to instruct the person who prepare the PDF file to have a cleaner PDF file so that I don't need to find different functions to transform dynamically.
Thanks.
So one PDF I got by printing from a Web page would come in with the headers nicely set up in the top row, sometimes split by a Line Feed "#(lf)", and sometimes in a different "row" when selecting the Folder (entire file) instead of a specific Table/Page which was inconsistent. To overcome these problems, you need to look for something consistant.

To determine which "row" of the folder was the one needed, after filtering the Kind column to have only Tables, I added a column that gave the Row Count of the tables
Power Query:
= Table.AddColumn(FilteredTablesOnly, "Rows", each Table.RowCount([Data]), type number)
The table I wanted always had more than 15 rows, so getting the table I needed meant filtering out rows where the # of rows in the table was less than 15
Power Query:
= Table.SelectRows(AddedRowsColumn, each [Rows] > 15)

Once I had the right table, I needed to get the header row set up properly. In its raw form the value in the first row of the first column was ALMOST always "AAP" although sometimes it was "AES", so after removing all Line Feeds from the document replacing them with a space, I transposed the entire table. That meant that anything before the COLUMN that held "AAP" in the first row was a header, and those columns needed to be merged. The columns all still had generic Column# headers, so writing a long conditional step to determine which columns IF ANY needed to be merged. This is that step:
Power Query:
= if TransposedTable0{0}[Column5] = "AAP" or TransposedTable0{0}[Column4] = "AES" then 
        Table.CombineColumns(TransposedTable0,{"Column1", "Column2", "Column3", "Column4"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
    else if TransposedTable0{0}[Column4] = "AAP" or TransposedTable0{0}[Column4] = "AES" then 
        Table.CombineColumns(TransposedTable0,{"Column1", "Column2", "Column3"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
    else if TransposedTable0{0}[Column3] = "AAP" or TransposedTable0{0}[Column4] = "AES" then 
        Table.CombineColumns(TransposedTable0,{"Column1", "Column2"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged") 
    else Table.RenameColumns(TransposedTable0,{{"Column1", "Merged"}})
After that I was able to Transpose the table back to its original layout and finish any other transformations needed. This isn't necessarily a "set it and forget it" solution, but typically when the code breaks, it's VERY OBVIOUS! It took several days to make sure that the code was working reliably, but even then an update to the Browser's Print routine (I'm using a Chrome browser and even though I have a full Acrobat DC license, the Acrobat Printer is TERRIBLE to work with for PDFs and I find the native Print to PDF to be a LOT more reliable) could break the query. However I've been using the query for well over a year, and don't think I've had to change it in the last six months.

Your mileage will vary, but you can always get help here!
 
Upvote 0

Forum statistics

Threads
1,225,761
Messages
6,186,893
Members
453,383
Latest member
SSXP

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top