A LITTLE BACKGROUND INFO
We are currently converting mortgage statements that come to us as scanned PDFs. These statements come from over 100 municipalities across Canada. There are over 70 variations as the statements are created using different pieces of software. As a result, a unique PDF2XL page layout has to be created for each one.
Essentially a page layout properly identifies all the tables and fields of a scanned PDF document. Each statement is comprised of core information that needs to be interpreted as columns of a single table.
Roll No. | Owner | Address | Mortgage No. | Installments | and so on...
If the source PDF is well structured, it's not too bad, even though there are so many variations. There are also fields that reside outside the main table information, like dates, titles, and other information that would stand on its own.
Once a PDF2XL is established for a certain variation, we can apply that same layout to any scanned PDFs that follow that exact format. PDF2XL then allows us to convert the document to an Excel file. Each table, and/or field grouping can be converted to individual worksheets within an Excel workbook.
---
THE CHALLENGE
The 70+ variations have to map to only three (3) standard Excel layouts. We are trying to automate the process as much as possible because we have 50,000 records to process (with 5-20 per 8.5x11 PDF). Is it possible to populate a target Excel document with the converted data from PDF2XL?
We have limited control over how the converted tables and/or fields are positioned on a worksheet. Basically they start from A1. The three (3) standard Excel layouts however, have their corresponding columns starting further down on the worksheet, leaving space above for standalone field information.
It would be great if there was a way append the PDF2XL converted worksheet(s)/workbook to the one of the three (3) standard Excel layouts. At which point, lookups, dynamic ranges, etc could be established to automagically have that converted data from the source worksheet(s) flow into the main worksheet containing the standard layout.
Mortgage No. column starts on A1 - converted PDF2XL Excel worksheet
Mortgage No. column starts on B6 - standard Excel layout
Can I have it so the Mortgage No. column and it's underlying data flow into the standard Excel layout at B6?
---
I'M SCARED!
I only mentioned the lookups and dynamic ranges because it seems that that is the direction to go. That said, I don't even know where to start. I don't really delve into Excel and my formula expertise is limited to the SUM function... hehe
If anyone can help us, that would be amazing. If I can be directed to some more specific resources, that would also be great. There is just so much information out there, I don't know what to filter out and where to focus in on. For all I know, I could be way off base with my thinking...
Thanks in advance!
--
desperate to automate such a mundane time-consuming task,
kcjones76
We are currently converting mortgage statements that come to us as scanned PDFs. These statements come from over 100 municipalities across Canada. There are over 70 variations as the statements are created using different pieces of software. As a result, a unique PDF2XL page layout has to be created for each one.
Essentially a page layout properly identifies all the tables and fields of a scanned PDF document. Each statement is comprised of core information that needs to be interpreted as columns of a single table.
Roll No. | Owner | Address | Mortgage No. | Installments | and so on...
If the source PDF is well structured, it's not too bad, even though there are so many variations. There are also fields that reside outside the main table information, like dates, titles, and other information that would stand on its own.
Once a PDF2XL is established for a certain variation, we can apply that same layout to any scanned PDFs that follow that exact format. PDF2XL then allows us to convert the document to an Excel file. Each table, and/or field grouping can be converted to individual worksheets within an Excel workbook.
---
THE CHALLENGE
The 70+ variations have to map to only three (3) standard Excel layouts. We are trying to automate the process as much as possible because we have 50,000 records to process (with 5-20 per 8.5x11 PDF). Is it possible to populate a target Excel document with the converted data from PDF2XL?
We have limited control over how the converted tables and/or fields are positioned on a worksheet. Basically they start from A1. The three (3) standard Excel layouts however, have their corresponding columns starting further down on the worksheet, leaving space above for standalone field information.
It would be great if there was a way append the PDF2XL converted worksheet(s)/workbook to the one of the three (3) standard Excel layouts. At which point, lookups, dynamic ranges, etc could be established to automagically have that converted data from the source worksheet(s) flow into the main worksheet containing the standard layout.
Mortgage No. column starts on A1 - converted PDF2XL Excel worksheet
Mortgage No. column starts on B6 - standard Excel layout
Can I have it so the Mortgage No. column and it's underlying data flow into the standard Excel layout at B6?
---
I'M SCARED!
I only mentioned the lookups and dynamic ranges because it seems that that is the direction to go. That said, I don't even know where to start. I don't really delve into Excel and my formula expertise is limited to the SUM function... hehe
If anyone can help us, that would be amazing. If I can be directed to some more specific resources, that would also be great. There is just so much information out there, I don't know what to filter out and where to focus in on. For all I know, I could be way off base with my thinking...
Thanks in advance!
--
desperate to automate such a mundane time-consuming task,
kcjones76