How to add a custom column that identifies the page number data was extracted from within a PDF source. IE, "Page 4 of 200", "Page 5 of 200"... etc.

Clonk92

New Member
Joined
Jun 20, 2024
Messages
7
Office Version
  1. 365
Hi there.


I am building a database from a PDF source, Each PDF contains around 250 pages that I am amalgamating into one big database, with each PDF page representing one row in the database.

The data comes from PDFs, each containing roughly 200 - 500 pages of identically formatted data, so my query is extracting the data from a folder using the formula.

Power Query:
 = Pdf.Tables(File.Contents("Source"))
Formula.

I am wondering if there is a formula, that can identify which page of the PDF the data is coming from? Such that, when I go to validate the completeness of the data during review I can see that this data came from page 18 of the source PDF, or page 240 of the source PDF. etc. etc.


I was hoping the source column would include this but it's only including the file name and not the page number from the PDF. Which is useful but not as useful as I'd hope.


Additionally, is it possible to add this custom column at the end of my query so I don't risk breaking my query when I insert the step? Thanks in advance.
 

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
The "Source" string is supposed to be a PDF file path as the File.Contents function parameter, so I believe you wrote it that way as a generic string.

Do you see the following result for the Pdf.Tables step in your query? I understand you are using the Folder.Files function as well, but even if you get the binary content, it should still give the following table that you can see the Name, Kind, and Data columns and the name should give the page number.

1719618379843.png


Another sample from a different PDF file in my computer:

1719618228299.png

In any case, I'll need to see more of your query to be able to provide more help.
 
Upvote 0

Forum statistics

Threads
1,220,965
Messages
6,157,119
Members
451,398
Latest member
rjsteward

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top