Trying to report differences between two .pdf files using Excel

justtryingtolearn

New Member
Joined
Mar 9, 2018
Messages
11
Office Version
  1. 365
Platform
  1. Windows
Hi all,

Introduction:
I am trying to find text differences b/w two PDFs via Excel. The PDFs I am comparing are older and newer versions of the Preferred Drug List (PDL) that state Medicaid programs publish 2-4 times a year. My team's current process is to compare the older and newer versioned PDLs side-by-side and line-by-line by eye. This leaves us with a lot of room for error and takes A LOT of time. I have been trying to automate this process but so far to no avail...

A bit of background on me:
I am a fairly intermediate Excel user who is familiar with some of the more "advanced" features of Excel ranging from a solid understanding of VLOOKUP formulae and pivot tables to just a tiny bit of experience recording macros and writing VBA. I am working hard to learn as much as I can about Excel. I've purchased and am currently enrolled in Udemy courses taught by Chris Dutton and Kyle Pew, am listening to John Michaloudis' My Excel Online podcast, and am figuring out which Excel books to purchase to further organize my Excel education.

Back to the problem at hand:
The best solution I could come up with was to convert the .pdf PDLs to .txt files using the pdfttotext.com free online converter tool so that I could then import them to Excel in a clean, formula-friendly format with the end goal of comparing the imported data using some kind of conditional formatting in concert with some kind of Match or Vlookup formula.

However, upon importing to Excel I realized that converting my .pdfs to .txt took away all of the .pdf formatting, which carries great meaning for the PDL (see for yourself at http://healthandwelfare.idaho.gov/Portals/0/Medical/PrescriptionDrugs/IDMPDL.pdf).

**If you're interested, please use the above link to follow along: let's say the previous version of this PDL lists EXELON under the Non-Preferred Agents section, while this version lists EXELON under the Preferred Agents section. I would need to track that! Furthermore, there may be different Prior Authorization/Class Criteria listed on the old PDL than what is listed on the link above. I would need to track that too. Finally, I would need to keep all data's relationships with their class. In this case, EXELON belongs to the Cholinesterase Inhibitors subclass, rolled up under the Alzheimer's Drugs main class. The bulleted criteria would need to follow these same rules.

I've played around with a few different settings in Excel's Text Import Wizard but have not been able to meaningfully categorize my data like it is in the original .pdf. No matter which tool I've used, I'm left with one long list of data in Column A that no longer has any meaning.

Since my company is heavily reliant on Excel, and pitching the purchase of a 100% reliable .pdf comparison software would likely wind up dead-on-arrival to my boss's ears, I am really hoping someone could help me out.

I've heard great things about this forum and will be checking back daily to reply with any additional resources anyone may need. Furthermore, I open myself up completely to harsh criticism--if you see that I am thinking about any of the aforementioned methods wrong, please tell me and provide your alternative! I'd love to learn from you. :)

Thanks!

**If the link provided does not show EXELON on the first page listed under Preferred Agents, then the state of Idaho has likely updated their PDL. Please substitute EXELON with some other product under the Preferred Agents section to follow along.
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
Hi,

sorry can't help with the Excel comparison.
However I do use the software from this company (versions 2 and 3) to compare text files, for which it is indeed, Beyond Compare.
It also does an excellent job with PDFs, Office documents and various other data formats.
The comparison is side-by-side and changes easily visible.

Give it a go - it's peanuts for any company to have a single license.

https://www.scootersoftware.com/index.php

And I doubt just anyone can access that site you have linked.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,234
Messages
6,170,891
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top