Macro: Convert pdf into Excel and extract certain data to put together with file name

HeinrichPaul

New Member
Joined
Jul 8, 2020
Messages
14
Office Version
  1. 365
  2. 2016
  3. 2010
Platform
  1. Windows
Hello All,

I have about 500 pdf's with as to what I found so far up to 91 pages, maybe more. I have a macro for reading out pdf's a colleague created for me years ago to list the names of the pdf's in a folder into Excel and read out specified data beside the file name. Unfortunately for the new task and that he left the company I am trying to find a solution.
As the new pdf's aren't as 'clean' as the old ones, the tweaking that I tried doesn't work. Also the new files have several different listings as they are delivery notes as there can be one delivery date with one or more customers after and also one customer can have one or more dispatches the same day. I would only need the order number read out beside the file name, which means it would need to count how orders are in one pdf, list them and put the order numbers beside. Ideally the customer should be beside as well, not that necessary for the beginning.

So I have literally three macros, the old one which could open the pdf's get their name with the path and read out and found two ones to convert pdf into Excel. The one seems old as it converts via word, but I couldn't get it working so far even though I could change with my limited knowledge some things. The other converter doesn't work either so far, not under Office 2010 Excel nor under Office 365
VBA Code:
Sub Pdf_to_Excel()


Dim setting_sh As Worksheet
Set setting_sh = ThisWorkbook.Sheets("Setting")

Dim pdf_path As String
Dim excel_path As String

pdf_path = setting_sh.Range("E11").Value
excel_path = setting_sh.Range("E12").Value

Dim fso As New FileSystemObject
Dim fo As Folder
Dim f As File

Set fo = fso.GetFolder(pdf_path)

Dim wa As Object
Dim doc As Object
Dim wr As Object

Set wa = CreateObject("word.application")



'Dim wa As New Word.Application
wa.Visible = True
'Dim doc As Word.Document

Dim nwb As Workbook
Dim nsh As Worksheet
'Dim wr As Word.Range

For Each f In fo.Files
    Set doc = wa.documents.Open(f.Path, False, Format:="PDF Files")
    Set wr = doc.Paragraphs(1).Range
    wr.WholeStory
    
    
    
    Set nwb = Workbooks.Add
    Set nsh = nwb.Sheets(1)
    wr.Copy
    
    nsh.Paste
    nwb.SaveAs (excel_path & "\" & Replace(f.Name, ".pdf", ".xlsx"))
    
    doc.Close False
    nwb.Close False
Next

wa.Quit

MsgBox "Done"


End Sub

VBA Code:
Sub PDF_Excel_to_Adobe()



Dim myWorksheet As Worksheet

Dim adobeReaderPath As String

Dim pathAndFileName As String

Dim shellPathName As String



Set myWorksheet = ActiveWorkbook.Worksheets("Adobe Reader")



adobeReaderPath = "C:\Program Files\Adobe\Acrobat DC\Acrobat\Acrobat.exe"

pathAndFileName = "C:\Users\xxxxxxx.xxxxx\OneDrive - XX\Desktop\Job\pdf Read-Out"

shellPathName = adobeReaderPath & " """ & pathAndFileName & """"



Call Shell( _

pathname:=shellPathName, _

windowstyle:=vbNormalFocus)



Application.Wait Now + TimeValue("0:00:03")

SendKeys "%vpc"

SendKeys "^a"

SendKeys "^c"

Application.Wait Now + TimeValue("0:00:30")

With myWorksheet

.Range("B4").Select

.PasteSpecial Format:="Text"

End With

Call Shell("TaskKill /F /IM Acrobat.exe", vbHide)

End Sub

VBA Code:
Sub openingpdfs()
    
    If Cells(ROW_FIRST, 1) = "" Then End
        
    Range(Cells(ROW_FIRST, "c"), Cells(Rows.Count, "z")).Clear
    
    'Application.ScreenUpdating = False

    lastrow = ActiveSheet.Cells.Find(What:="*", After:=Range("A1"), SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row
    
    
    Application.DisplayAlerts = False
            
            
    Dim adobefile As String
    Dim zz, y1, y2 As Variant
            
            
    For i = ROW_FIRST To lastrow
        
        Application.StatusBar = "Reading pdf files. Please be patient! " & i - ROW_FIRST + 1 & "/" & lastrow - ROW_FIRST + 1
            
           adobefile = Cells(i, 1)
            
            x = ReadAcrobatDocument(adobefile)
            
            y = "NO MRN number was found!!!"
            On Error Resume Next
                y = Trim(Mid(x, InStr(x, "MRN"), 22))
            On Error GoTo 0
            Cells(i, "c") = y
            
            Cells(i, "e") = "NO tax number was found!!!"
            On Error Resume Next
                y1 = Split(Trim(Mid(x, InStr(x, "MRN"), 100)), Chr(10))
                Cells(i, "e") = Replace(y1(2), Chr(10), "")
            On Error GoTo 0
            
            Cells(i, "g") = "No customer was found!!!"
            On Error Resume Next
                y2 = Split(Trim(Mid(x, InStr(x, "Ladelisten"), 1000)), Chr(10))
                Cells(i, "g") = Replace(y2(2), Chr(10), "")
            On Error GoTo 0
           
            
            Z = "NO invoice number was found!!!"
            On Error Resume Next
                Z = Trim(Mid(x, InStr(x, "R:"), 50))
                If Z <> "NO invoice number was found!!!" Then
                    zz = False
                    zz = Split(Z, Chr(10))
                    
                    For j = 0 To UBound(zz)
                       Cells(i, j + 10) = Replace(zz(j), Chr(10), "")
                    
                    Next j
                                 
                End If
                
            On Error GoTo 0
           
            
    Next i
    
    Application.StatusBar = False
    Application.ScreenUpdating = True
    Application.DisplayAlerts = True
    
    a = MsgBox("Auf wiedersehen!")
    
End Sub
Private Function GetAllFiles(ByVal strPath As String, _
ByVal intRow As Integer, ByRef objFSO As Object) As Integer
Dim objFolder As Object
Dim objFile As Object
Dim i As Integer
i = intRow - ROW_FIRST + 1
Set objFolder = objFSO.GetFolder(strPath)
For Each objFile In objFolder.Files
        If Mid(objFile.Name, InStrRev(objFile.Name, ".") + 1, Len(objFile.Name)) = "pdf" Then
            Cells(i + ROW_FIRST - 1, 2) = objFile.Name
'            Cells(i + ROW_FIRST - 1, 2) = Mid(objFile.Name, InStrRev(objFile.Name, ".") + 1, Len(objFile.Name))
            Cells(i + ROW_FIRST - 1, 1) = objFile.Path
            i = i + 1
            Application.StatusBar = "Found file no. " & i
        End If
Next objFile
GetAllFiles = i + ROW_FIRST - 1
End Function

So literally to convert the pdf into Excel file, open the files, count the order (the longest mentioned has 208 orders, had it converted manually into xlsx and counted it.) Take over file name with either file name plus 207 empty spaces or 208 times file name plus the order number beside and copy them all together into one sheet. Not sure if that one below could be used:
VBA Code:
Private Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
  Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
  Set PageNumber = AcroPDDoc.AcquirePage(i)
  Set PageContent = CreateObject("AcroExch.HiliteList")
  If PageContent.Add(0, 9000) <> True Then Exit Function
  Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
  ' The next line is needed to avoid errors with protected PDFs that can't be read
  On Error Resume Next
  For j = 0 To AcroTextSelect.GetNumText - 1
    Content = Content & AcroTextSelect.GetText(j)
  Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
I would have one to get all folders as well, but am not sure how to get that all together and a pdf which I could empty the sensitive data from but could not find a button to upload it. Apologies if there are foreign signs in. My keyboard suddenly switched to German layout and I do not know how to switch it back.
Kind Regards
Matthias
 

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
I've cross checked and it would only be the order number to be read out of, but in mentioned case above 208 times in one document.
 
Upvote 0

Forum statistics

Threads
1,225,617
Messages
6,186,017
Members
453,334
Latest member
Prakash Jha

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top