Power Query super slow while getting tables from pdf and merging

BalloutMoe

Board Regular
Joined
Jun 4, 2021
Messages
137
Office Version
  1. 365
Platform
  1. Windows
Below I have a code that gets pdf tables and filters and merges however it is taking a lot of time to get this done, anyway to help speed this up?

Power Query:
let
    Source = Folder.Files(FolderPath),
    #"FilteredCSV-ReturnFldr" = Table.AddColumn( Table.SelectRows( Table.SelectRows(Source, each ([Extension] = ".pdf")), each not Text.Contains([Folder Path], "Return")), "Custom", each Pdf.Tables([Content])),
    #"Expanded Custom" = Table.SelectRows( Table.ExpandTableColumn(#"FilteredCSV-ReturnFldr", "Custom", {"Kind", "Data"}, {"Custom.Kind", "Custom.Data"}),each ([Custom.Kind] = "Table")),
    #"Added Custom1" = Table.SelectRows( Table.AddColumn(#"Expanded Custom", "Custom", each Table.Contains( [Custom.Data], [Column2 = "QTY"])), each ([Custom] = true)),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.1", each Table.RemoveFirstN(  [Custom.Data], 1)),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.2", each Table.PromoteHeaders([Custom.1])),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Custom.2", "Name"}),
    #"Expanded Custom.2" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom.2", ExpandColumns, ExpandColumns),
    #"Merged Columns" = Table.CombineColumns(#"Expanded Custom.2",ColumnNames,Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Description"),
    #"Filtered Rows1" = Table.SelectRows(#"Merged Columns", each ([QTY] <> null))
in
    #"Filtered Rows1"
 

Excel Facts

Get help while writing formula
Click the italics "fx" icon to the left of the formula bar to open the Functions Arguments dialog. Help is displayed for each argument.
Where are your files located? if they are on SharePoint or a server, move them to your local drive. This tends to be one of the biggest negative factors.
 
Upvote 0
its this line the merge
#"Merged Columns" = Table.CombineColumns(#"Expanded Custom.2",ColumnNames,Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Description"),
When I use ColumnNames variable it takes a while but when i do it manually its faster
 
Upvote 0
it will never be super fast with PDF as the file source, and your expectation of what "fast" is may be to high. If you are in Excel and refresh the query, how long does it take? How many PDF files?
 
Upvote 0
it will never be super fast with PDF as the file source, and your expectation of what "fast" is may be to high. If you are in Excel and refresh the query, how long does it take? How many PDF files?
About 80 Files each have a single page. It is taking above 30 minutes
 
Upvote 0
Well if it were me, I would do the following
  1. combine and load the data in the tables with as little manipulation as possible first. Certainly not using the Table.CombineColumns step
  2. potentially save these data in a flat Excel file as a first step. It still takes 30 mins but is done once.
  3. process the file from 2 above. This should then be fast.
  4. If you have new files coming in the future, archive all the 80 old files and use your existing query for just the new files
  5. append the 80 history files already preprocessed from 3 as "history" data
  6. over time as the new list of files grows and it gets slow, rebuild the history file.
 
Upvote 0

Forum statistics

Threads
1,223,948
Messages
6,175,567
Members
452,652
Latest member
eduedu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top