Hi guys!
I know that there are at least 5-6 topics on how to extract certain parts of text from Word to Excel, but I am just kindly asking you to go over my request, as there are some specificities. I will try to be brief:
Many thanks for taking the time to read this! I really appreciate it!
I know that there are at least 5-6 topics on how to extract certain parts of text from Word to Excel, but I am just kindly asking you to go over my request, as there are some specificities. I will try to be brief:
- I have over 5000 PDF and Word files that I need to extract data from. The word files are in majority so I have decided to extract the first page of each PDF (done that), convert that first page to .doc (done that) so now I have the same base.
- What I need to do:
- Loop through each word file (1-page document)
- Extract only parts of data (see point 4)
- Insert those date in an excel sheet for further analysis
- The entire page looks like the one in the image – the black boxes are not relevant
- The information that I need to extract is highlighted in yellow
- The problem is that in the last section, the Secondary diagnostics can vary from 2-3 to over 20 positions, depending on the patient. I was considering to use wildcards but I am not sure that this is a good approach. In fact, this variability from patient to patient is the main problem, otherwise I could (possibly) have extracted the data with an automated tool.
Many thanks for taking the time to read this! I really appreciate it!