justtryingtolearn
New Member
- Joined
- Mar 9, 2018
- Messages
- 11
- Office Version
- 365
- Platform
- Windows
Hi all,
Introduction:
I am trying to find text differences b/w two PDFs via Excel. The PDFs I am comparing are older and newer versions of the Preferred Drug List (PDL) that state Medicaid programs publish 2-4 times a year. My team's current process is to compare the older and newer versioned PDLs side-by-side and line-by-line by eye. This leaves us with a lot of room for error and takes A LOT of time. I have been trying to automate this process but so far to no avail...
A bit of background on me:
I am a fairly intermediate Excel user who is familiar with some of the more "advanced" features of Excel ranging from a solid understanding of VLOOKUP formulae and pivot tables to just a tiny bit of experience recording macros and writing VBA. I am working hard to learn as much as I can about Excel. I've purchased and am currently enrolled in Udemy courses taught by Chris Dutton and Kyle Pew, am listening to John Michaloudis' My Excel Online podcast, and am figuring out which Excel books to purchase to further organize my Excel education.
Back to the problem at hand:
The best solution I could come up with was to convert the .pdf PDLs to .txt files using the pdfttotext.com free online converter tool so that I could then import them to Excel in a clean, formula-friendly format with the end goal of comparing the imported data using some kind of conditional formatting in concert with some kind of Match or Vlookup formula.
However, upon importing to Excel I realized that converting my .pdfs to .txt took away all of the .pdf formatting, which carries great meaning for the PDL (see for yourself at http://healthandwelfare.idaho.gov/Portals/0/Medical/PrescriptionDrugs/IDMPDL.pdf).
**If you're interested, please use the above link to follow along: let's say the previous version of this PDL lists EXELON under the Non-Preferred Agents section, while this version lists EXELON under the Preferred Agents section. I would need to track that! Furthermore, there may be different Prior Authorization/Class Criteria listed on the old PDL than what is listed on the link above. I would need to track that too. Finally, I would need to keep all data's relationships with their class. In this case, EXELON belongs to the Cholinesterase Inhibitors subclass, rolled up under the Alzheimer's Drugs main class. The bulleted criteria would need to follow these same rules.
I've played around with a few different settings in Excel's Text Import Wizard but have not been able to meaningfully categorize my data like it is in the original .pdf. No matter which tool I've used, I'm left with one long list of data in Column A that no longer has any meaning.
Since my company is heavily reliant on Excel, and pitching the purchase of a 100% reliable .pdf comparison software would likely wind up dead-on-arrival to my boss's ears, I am really hoping someone could help me out.
I've heard great things about this forum and will be checking back daily to reply with any additional resources anyone may need. Furthermore, I open myself up completely to harsh criticism--if you see that I am thinking about any of the aforementioned methods wrong, please tell me and provide your alternative! I'd love to learn from you.
Thanks!
**If the link provided does not show EXELON on the first page listed under Preferred Agents, then the state of Idaho has likely updated their PDL. Please substitute EXELON with some other product under the Preferred Agents section to follow along.
Introduction:
I am trying to find text differences b/w two PDFs via Excel. The PDFs I am comparing are older and newer versions of the Preferred Drug List (PDL) that state Medicaid programs publish 2-4 times a year. My team's current process is to compare the older and newer versioned PDLs side-by-side and line-by-line by eye. This leaves us with a lot of room for error and takes A LOT of time. I have been trying to automate this process but so far to no avail...
A bit of background on me:
I am a fairly intermediate Excel user who is familiar with some of the more "advanced" features of Excel ranging from a solid understanding of VLOOKUP formulae and pivot tables to just a tiny bit of experience recording macros and writing VBA. I am working hard to learn as much as I can about Excel. I've purchased and am currently enrolled in Udemy courses taught by Chris Dutton and Kyle Pew, am listening to John Michaloudis' My Excel Online podcast, and am figuring out which Excel books to purchase to further organize my Excel education.
Back to the problem at hand:
The best solution I could come up with was to convert the .pdf PDLs to .txt files using the pdfttotext.com free online converter tool so that I could then import them to Excel in a clean, formula-friendly format with the end goal of comparing the imported data using some kind of conditional formatting in concert with some kind of Match or Vlookup formula.
However, upon importing to Excel I realized that converting my .pdfs to .txt took away all of the .pdf formatting, which carries great meaning for the PDL (see for yourself at http://healthandwelfare.idaho.gov/Portals/0/Medical/PrescriptionDrugs/IDMPDL.pdf).
**If you're interested, please use the above link to follow along: let's say the previous version of this PDL lists EXELON under the Non-Preferred Agents section, while this version lists EXELON under the Preferred Agents section. I would need to track that! Furthermore, there may be different Prior Authorization/Class Criteria listed on the old PDL than what is listed on the link above. I would need to track that too. Finally, I would need to keep all data's relationships with their class. In this case, EXELON belongs to the Cholinesterase Inhibitors subclass, rolled up under the Alzheimer's Drugs main class. The bulleted criteria would need to follow these same rules.
I've played around with a few different settings in Excel's Text Import Wizard but have not been able to meaningfully categorize my data like it is in the original .pdf. No matter which tool I've used, I'm left with one long list of data in Column A that no longer has any meaning.
Since my company is heavily reliant on Excel, and pitching the purchase of a 100% reliable .pdf comparison software would likely wind up dead-on-arrival to my boss's ears, I am really hoping someone could help me out.
I've heard great things about this forum and will be checking back daily to reply with any additional resources anyone may need. Furthermore, I open myself up completely to harsh criticism--if you see that I am thinking about any of the aforementioned methods wrong, please tell me and provide your alternative! I'd love to learn from you.
Thanks!
**If the link provided does not show EXELON on the first page listed under Preferred Agents, then the state of Idaho has likely updated their PDL. Please substitute EXELON with some other product under the Preferred Agents section to follow along.