Combine multiple PDFs with multiple pages into excel.

bunny1122

New Member
Joined
Jul 8, 2022
Messages
21
Office Version
  1. 365
Platform
  1. Windows
Hi guys, hope someone can help me out.

I have hundreds of PDFs in a folder, each have multiple pages containing tables with similar structure. It's a daily report for each PDF and I need to combine all the PDFs into one excel.

There is no option to select multiple pages in the "import from folder" data query as far as I am aware. I am not familiar with power query so a step-by-step guide is greatly appreciated.

Thank you!
 

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
Once you've selected the folder, select the Transform Data button.
Power Query:
let
    Source = Folder.Files("C:\Users\USER\FOLDER\Excel\Stuff\ReceiptPDFs"),
    LowercasedText = Table.TransformColumns(Source,{{"Extension", Text.Lower, type text}}),
    FilteredRows = Table.SelectRows(LowercasedText, each [Extension] = ".pdf"),
    ReplacedValue = Table.ReplaceValue(FilteredRows,".pdf","",Replacer.ReplaceText,{"Name"}),
    RemovedOtherColumns = Table.SelectColumns(ReplacedValue,{"Name", "Content"})
in
    RemovedOtherColumns
Now Expand the Content column, but select the Folder, not an individual file:
1671540569615.png

Power Query will create three Helper Queries, and one Sample Query. Before proceeding, if the PDFs have Tables, you will have to decide whether you want to retain Pages or Tables. This is easily done in the Kind column.
1671540777820.png

If you need data not contained within a Table, select Page, but Table is preferable. Once you've done that, expand the Content Column. PQ will generate three Helper queries and a Transform Sample File query.
Power Query:
let
    Source = Folder.Files("C:\Users\USER\FOLDER\Excel\Stuff\ReceiptPDFs"),
    // To make sure all Extensions are Lower Case
    LowercasedText = Table.TransformColumns(Source,{{"Extension", Text.Lower, type text}}),
    // Select oonly file types needed.
    FilteredRows = Table.SelectRows(LowercasedText, each [Extension] = ".pdf"),
    // Removed Filename Extension from Name column.
    ReplacedValue = Table.ReplaceValue(FilteredRows,".pdf","",Replacer.ReplaceText,{"Name"}),
    RemovedOtherColumns = Table.SelectColumns(ReplacedValue,{"Name", "Content"}),
    // To prevent Column Name conflict.
    RenamedColumns = Table.RenameColumns(RemovedOtherColumns,{{"Name", "FileName"}}),
    #"Filtered Hidden Files1" = Table.SelectRows(RenamedColumns, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", { "FileName", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Table Column1", each ([Kind] = "Table"))
in
    #"Filtered Rows"
I removed the space from previous steps, but left them in on the steps automatically generated.
NOTE that I had to manually add the "FileName" column to the #"Removed Other Columns1" step. At this point you can remove Other Columns - KEEP FileName and Data columns, and expand the Data column.
You don't need to keep the Name column, but typically it's needed to determine where the data came from.
Good luck!
 
Upvote 0
Solution

Forum statistics

Threads
1,223,397
Messages
6,171,878
Members
452,427
Latest member
samk379

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top