Get specific data from nonuniform PDFs and place in a single table in Excel

jkingston

New Member
Joined
Oct 25, 2024
Messages
1
Office Version
  1. 365
Platform
  1. Windows
Hello,

I need to get specific data from nonuniform PDFs and place them in a single table. For example, say I have 100 clients that have separate types of tax forms. Say Schedule A, B, E, F, and H. Some clients have only Schedules A, E, and H. Some clients have B, F, and H. The Schedules have varying length of page numbers, so it makes it difficult to use PowerQuery to open Table 20 (Page 10), for example, because the specific data I need from Schedule H sits on page 8 in one file and sits on page 12 in another file, but both contain Schedule H and Schedule H is always a uniform, identical, 4 pages.

How can I use PowerQuery or another tool to pull a specific data item from Schedule H, regardless of where it sits in the PDF file for individual clients? Say I have thousands of PDFs that I want to pull a specific data entry from but it sits on different pages as described above, but within a specific tax form, Schedule H.

Thank you so much,

Jared
 

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.
I am currently working on completely nonuniform PDFs into Excel tables through Power Query. At this VERY moment. HA

Because they are so inconsistent in their inclusion, order, or even the right template...I've had to NOT use the Tables, but use the Pages instead. Expand all of the data until the final column...I have to add some extra because a Special Character might create another column in my stuff.

Anyway. Merge all of the expanded columns...I use "-@-" (no quotes) as my delimiter. From here and for you, you can begin creating helper columns, the first one being "Schedule"...if Text.Contains([Merged Column], "Schedule") then [Merged Column] else null. Tailor it until you are able to narrow down on to which Schedule is within the Merged Column....like, clear out anything else that may have been included in the Merged Column. Then, Fill Down your "Schedule" column. After that, you can create a template for all of the needs for each individual Schedule, breaking them back out by the "-@-".

Its incredibly difficult for me to explain any further than that (at least first reply), but hopefully that helps get your brain moving in the path that you need it to.
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top