pdf-open 1 file, extract each table separately

bobaol

Board Regular
Joined
Jun 3, 2002
Messages
227
Office Version
  1. 365
  2. 2003 or older
Platform
  1. Windows
Hello, I have a few thousand pdf files. each file has 14 tables to 444 tables, it varies. Each table may have 1 column to 144 columns, it varies. I want to import each pdf file (using PQ, macro, or otherwise), then save it. The import I want is each file, each table be in its own sheet, then save the file, do this again for the next file. In the import process, it is okay to assume every file has 444 tables, and each table has 144 columns. and assume all data types are text. I will manipulate it further after all the data is imported. How do I import each file, break up each table into its own sheet, then save the file as .xlsx or .xlsm? any help is appreciated. Thanks in advance.

Here is a sample of one file. But how do I tell what is table001 or table002?
Table001
= Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}})

Table002
= Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}})

Table003
= Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}})

Table004
= Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}})
 

Excel Facts

Excel motto
Not everything I do at work revolves around Excel. Only the fun parts.
Is there any way at all that you can get the data another way, I highly highly recommend you do that. If you get told no, try harder to get a yes. Of course, it may not be possible but I can almost guarantee that doing what you describe with thousands of PDFs will be a nightmare. PDFs are notorious for changing structure over time.

If you want to/have to proceed with PDFs, I recommend doing a pilot loading maybe the first table only. You should be able to process a single file manually, turn it into a function, then run it over the list of files you have. I cover the principle in this article (it’s a bit different but the principle is the same Combining Multiple Sheets from Multiple Workbooks)

I am almost certain you will have problems somewhere in the list of files.
 
Upvote 0

Forum statistics

Threads
1,225,502
Messages
6,185,349
Members
453,287
Latest member
Emeister

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top