HeinrichPaul
New Member
- Joined
- Jul 8, 2020
- Messages
- 14
- Office Version
- 365
- 2016
- 2010
- Platform
- Windows
Hello All,
I have about 500 pdf's with as to what I found so far up to 91 pages, maybe more. I have a macro for reading out pdf's a colleague created for me years ago to list the names of the pdf's in a folder into Excel and read out specified data beside the file name. Unfortunately for the new task and that he left the company I am trying to find a solution.
As the new pdf's aren't as 'clean' as the old ones, the tweaking that I tried doesn't work. Also the new files have several different listings as they are delivery notes as there can be one delivery date with one or more customers after and also one customer can have one or more dispatches the same day. I would only need the order number read out beside the file name, which means it would need to count how orders are in one pdf, list them and put the order numbers beside. Ideally the customer should be beside as well, not that necessary for the beginning.
So I have literally three macros, the old one which could open the pdf's get their name with the path and read out and found two ones to convert pdf into Excel. The one seems old as it converts via word, but I couldn't get it working so far even though I could change with my limited knowledge some things. The other converter doesn't work either so far, not under Office 2010 Excel nor under Office 365
So literally to convert the pdf into Excel file, open the files, count the order (the longest mentioned has 208 orders, had it converted manually into xlsx and counted it.) Take over file name with either file name plus 207 empty spaces or 208 times file name plus the order number beside and copy them all together into one sheet. Not sure if that one below could be used:
I would have one to get all folders as well, but am not sure how to get that all together and a pdf which I could empty the sensitive data from but could not find a button to upload it. Apologies if there are foreign signs in. My keyboard suddenly switched to German layout and I do not know how to switch it back.
Kind Regards
Matthias
I have about 500 pdf's with as to what I found so far up to 91 pages, maybe more. I have a macro for reading out pdf's a colleague created for me years ago to list the names of the pdf's in a folder into Excel and read out specified data beside the file name. Unfortunately for the new task and that he left the company I am trying to find a solution.
As the new pdf's aren't as 'clean' as the old ones, the tweaking that I tried doesn't work. Also the new files have several different listings as they are delivery notes as there can be one delivery date with one or more customers after and also one customer can have one or more dispatches the same day. I would only need the order number read out beside the file name, which means it would need to count how orders are in one pdf, list them and put the order numbers beside. Ideally the customer should be beside as well, not that necessary for the beginning.
So I have literally three macros, the old one which could open the pdf's get their name with the path and read out and found two ones to convert pdf into Excel. The one seems old as it converts via word, but I couldn't get it working so far even though I could change with my limited knowledge some things. The other converter doesn't work either so far, not under Office 2010 Excel nor under Office 365
VBA Code:
Sub Pdf_to_Excel()
Dim setting_sh As Worksheet
Set setting_sh = ThisWorkbook.Sheets("Setting")
Dim pdf_path As String
Dim excel_path As String
pdf_path = setting_sh.Range("E11").Value
excel_path = setting_sh.Range("E12").Value
Dim fso As New FileSystemObject
Dim fo As Folder
Dim f As File
Set fo = fso.GetFolder(pdf_path)
Dim wa As Object
Dim doc As Object
Dim wr As Object
Set wa = CreateObject("word.application")
'Dim wa As New Word.Application
wa.Visible = True
'Dim doc As Word.Document
Dim nwb As Workbook
Dim nsh As Worksheet
'Dim wr As Word.Range
For Each f In fo.Files
Set doc = wa.documents.Open(f.Path, False, Format:="PDF Files")
Set wr = doc.Paragraphs(1).Range
wr.WholeStory
Set nwb = Workbooks.Add
Set nsh = nwb.Sheets(1)
wr.Copy
nsh.Paste
nwb.SaveAs (excel_path & "\" & Replace(f.Name, ".pdf", ".xlsx"))
doc.Close False
nwb.Close False
Next
wa.Quit
MsgBox "Done"
End Sub
VBA Code:
Sub PDF_Excel_to_Adobe()
Dim myWorksheet As Worksheet
Dim adobeReaderPath As String
Dim pathAndFileName As String
Dim shellPathName As String
Set myWorksheet = ActiveWorkbook.Worksheets("Adobe Reader")
adobeReaderPath = "C:\Program Files\Adobe\Acrobat DC\Acrobat\Acrobat.exe"
pathAndFileName = "C:\Users\xxxxxxx.xxxxx\OneDrive - XX\Desktop\Job\pdf Read-Out"
shellPathName = adobeReaderPath & " """ & pathAndFileName & """"
Call Shell( _
pathname:=shellPathName, _
windowstyle:=vbNormalFocus)
Application.Wait Now + TimeValue("0:00:03")
SendKeys "%vpc"
SendKeys "^a"
SendKeys "^c"
Application.Wait Now + TimeValue("0:00:30")
With myWorksheet
.Range("B4").Select
.PasteSpecial Format:="Text"
End With
Call Shell("TaskKill /F /IM Acrobat.exe", vbHide)
End Sub
VBA Code:
Sub openingpdfs()
If Cells(ROW_FIRST, 1) = "" Then End
Range(Cells(ROW_FIRST, "c"), Cells(Rows.Count, "z")).Clear
'Application.ScreenUpdating = False
lastrow = ActiveSheet.Cells.Find(What:="*", After:=Range("A1"), SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row
Application.DisplayAlerts = False
Dim adobefile As String
Dim zz, y1, y2 As Variant
For i = ROW_FIRST To lastrow
Application.StatusBar = "Reading pdf files. Please be patient! " & i - ROW_FIRST + 1 & "/" & lastrow - ROW_FIRST + 1
adobefile = Cells(i, 1)
x = ReadAcrobatDocument(adobefile)
y = "NO MRN number was found!!!"
On Error Resume Next
y = Trim(Mid(x, InStr(x, "MRN"), 22))
On Error GoTo 0
Cells(i, "c") = y
Cells(i, "e") = "NO tax number was found!!!"
On Error Resume Next
y1 = Split(Trim(Mid(x, InStr(x, "MRN"), 100)), Chr(10))
Cells(i, "e") = Replace(y1(2), Chr(10), "")
On Error GoTo 0
Cells(i, "g") = "No customer was found!!!"
On Error Resume Next
y2 = Split(Trim(Mid(x, InStr(x, "Ladelisten"), 1000)), Chr(10))
Cells(i, "g") = Replace(y2(2), Chr(10), "")
On Error GoTo 0
Z = "NO invoice number was found!!!"
On Error Resume Next
Z = Trim(Mid(x, InStr(x, "R:"), 50))
If Z <> "NO invoice number was found!!!" Then
zz = False
zz = Split(Z, Chr(10))
For j = 0 To UBound(zz)
Cells(i, j + 10) = Replace(zz(j), Chr(10), "")
Next j
End If
On Error GoTo 0
Next i
Application.StatusBar = False
Application.ScreenUpdating = True
Application.DisplayAlerts = True
a = MsgBox("Auf wiedersehen!")
End Sub
Private Function GetAllFiles(ByVal strPath As String, _
ByVal intRow As Integer, ByRef objFSO As Object) As Integer
Dim objFolder As Object
Dim objFile As Object
Dim i As Integer
i = intRow - ROW_FIRST + 1
Set objFolder = objFSO.GetFolder(strPath)
For Each objFile In objFolder.Files
If Mid(objFile.Name, InStrRev(objFile.Name, ".") + 1, Len(objFile.Name)) = "pdf" Then
Cells(i + ROW_FIRST - 1, 2) = objFile.Name
' Cells(i + ROW_FIRST - 1, 2) = Mid(objFile.Name, InStrRev(objFile.Name, ".") + 1, Len(objFile.Name))
Cells(i + ROW_FIRST - 1, 1) = objFile.Path
i = i + 1
Application.StatusBar = "Found file no. " & i
End If
Next objFile
GetAllFiles = i + ROW_FIRST - 1
End Function
So literally to convert the pdf into Excel file, open the files, count the order (the longest mentioned has 208 orders, had it converted manually into xlsx and counted it.) Take over file name with either file name plus 207 empty spaces or 208 times file name plus the order number beside and copy them all together into one sheet. Not sure if that one below could be used:
VBA Code:
Private Function ReadAcrobatDocument(strFileName As String) As String
'Note: A Reference to the Adobe Library must be set in Tools|References!
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function
Kind Regards
Matthias