Hi there.
I am building a database from a PDF source, Each PDF contains around 250 pages that I am amalgamating into one big database, with each PDF page representing one row in the database.
The data comes from PDFs, each containing roughly 200 - 500 pages of identically formatted data, so my query is extracting the data from a folder using the formula.
Formula.
I am wondering if there is a formula, that can identify which page of the PDF the data is coming from? Such that, when I go to validate the completeness of the data during review I can see that this data came from page 18 of the source PDF, or page 240 of the source PDF. etc. etc.
I was hoping the source column would include this but it's only including the file name and not the page number from the PDF. Which is useful but not as useful as I'd hope.
Additionally, is it possible to add this custom column at the end of my query so I don't risk breaking my query when I insert the step? Thanks in advance.
I am building a database from a PDF source, Each PDF contains around 250 pages that I am amalgamating into one big database, with each PDF page representing one row in the database.
The data comes from PDFs, each containing roughly 200 - 500 pages of identically formatted data, so my query is extracting the data from a folder using the formula.
Power Query:
= Pdf.Tables(File.Contents("Source"))
I am wondering if there is a formula, that can identify which page of the PDF the data is coming from? Such that, when I go to validate the completeness of the data during review I can see that this data came from page 18 of the source PDF, or page 240 of the source PDF. etc. etc.
I was hoping the source column would include this but it's only including the file name and not the page number from the PDF. Which is useful but not as useful as I'd hope.
Additionally, is it possible to add this custom column at the end of my query so I don't risk breaking my query when I insert the step? Thanks in advance.