VBA Page Count

krishhi

Active Member
Joined
Sep 8, 2008
Messages
328
Hello Everyone,

I have nearly 1500 Pdf files, Each have its own name. So, I have to Open the pdf and List the page numbers in Excel. So, I need a macro code to open the pdf and count the page numbers and post it into excel.

Can i Get Any Help from you Guys,
Krish.
 
Re: VBA - working with PDFs

Hi Seji, Is there a way to do this on a per file basis, ie Col A2:A100 has the file path names and return page number in B2:B100 ?
 
Upvote 0

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Re: VBA - working with PDFs

Maybe you cannot get page number by the way as fomura =GetPageNum(A2) ... =GetPageNum(A100)

Try to write function to count those PDFs and to write down the result onto B2:B100 like as below. Please adjust it according to your circumstances.

Code:
Sub CountAndWriteResult()
    Const RW_DHEAD As Long = 2
    Const RW_LIMIT As Long = 1000
    Const CL_DATA As String = "A"
    Const CL_RESULT As String = "B"
    Dim rw As Long
    Dim rw_dtail As Long

    With Worksheets("Sheet1")
        .Activate
        rw_dtail = .Cells(RW_LIMIT, CL_DATA).End(xlUp).Row
        If rw_dtail < RW_DHEAD Then
            MsgBox "There are no data paths", vbCritical
            Exit Sub
        End If
        For rw = RW_DHEAD To rw_dtail
            .Cells(rw, CL_RESULT).Value = GetPageNum(.Cells(rw, CL_DATA).Value)
        Next
    End With
    MsgBox "Done!"
End Sub
 
Upvote 0
Re: VBA - working with PDFs

Hi, jrwrita

another way is add "Application.Volatile" into function GetPageNum, just like below.
So you can use this function as =GetPageNum(A2) ... =GetPageNum(A100) on B2:B100.

Code:
Function GetPageNum(ByVal PDF_File As String) As Long
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Dim getpage0 As Long, getpage1 As Long, getpage2 As Long, _
        getpage3 As Long, getpage4 As Long

    Application.Volatile  ' <--here

    FileNum = FreeFile
    Open PDF_File For Binary As [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=FileNum]#FileNum[/URL] 
        strRetVal = Space(LOF(FileNum))
        Get [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=FileNum]#FileNum[/URL] , , strRetVal
    Close [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=FileNum]#FileNum[/URL] 

(below, omitted)
<below, omitted="">
</below,>
 
Upvote 0
I rewrote the code that I linked to before and I while I don't have as large of a pdf sample as you do I tested it on few dozen and the result were good.
Code:
Sub PDFandNumPages()
  
   Dim Folder As Object
   Dim file As Object
   Dim fso As Object
   Dim iExtLen As Integer, iRow As Integer
   Dim sFolder As String, sExt As String
   Dim sPDFName As String

   sExt = "pdf"
   iExtLen = Len(sExt)
   iRow = 1
   ' Must have a '\' at the end of path
   sFolder = "C:\pdf_Directory\"
  
   Set fso = CreateObject("Scripting.FileSystemObject")
  
   If sFolder <> "" Then
      Set Folder = fso.GetFolder(sFolder)
      For Each file In Folder.Files
         If Right(file, iExtLen) = sExt Then
            Cells(iRow, 1).Value = file.Name
            Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
            iRow = iRow + 1
         End If
      Next file
   End If

End Sub
The below code is what gets the number of pages.
Code:
Function pageCount(sFilePathName As String) As Integer

Dim nFileNum As Integer
Dim sInput As String
Dim sNumPages As String
Dim iPosN1 As Integer, iPosN2 As Integer
Dim iPosCount1 As Integer, iPosCount2 As Integer
Dim iEndsearch As Integer

' Get an available file number from the system
nFileNum = FreeFile

'OPEN the PDF file in Binary mode
Open sFilePathName For Binary Lock Read Write As #nFileNum
 
  ' Get the data from the file
  Do Until EOF(nFileNum)
      Input #1, sInput
      sInput = UCase(sInput)
      iPosN1 = InStr(1, sInput, "/N ") + 3
      iPosN2 = InStr(iPosN1, sInput, "/")
      iPosCount1 = InStr(1, sInput, "/COUNT ") + 7
      iPosCount2 = InStr(iPosCount1, sInput, "/")
     
   If iPosN1 > 3 Then
      sNumPages = Mid(sInput, iPosN1, iPosN2 - iPosN1)
      Exit Do
   ElseIf iPosCount1 > 7 Then
      sNumPages = Mid(sInput, iPosCount1, iPosCount2 - iPosCount1)
      Exit Do
   ' Prevent overflow and assigns 0 to number of pages if strings are not in binary
   ElseIf iEndsearch > 1001 Then
      sNumPages = "0"
      Exit Do
   End If
      iEndsearch = iEndsearch + 1
   Loop
  
  ' Close pdf file
  Close #nFileNum
  pageCount = CInt(sNumPages)
 
End Function
The first code search a directory for pdf's but that sub is just one way to do it. If the pdfs are in many different subdirectory I would suggest Chip Pearson's Directory Tree add-in it can generate files with path for all file types or just the extensions you choose.

Best of luck.

Sir I want to search for number of pages in pdf given path and file name in excel sheet Sir I tried his code it works fine for single file, Sir if the single pdf file have less pages but when I did this where pdf have more pages pagecount returns zero please help I am searching for past 4 days but not able to get answers
 
Upvote 0
I rewrote the code that I linked to before and I while I don't have as large of a pdf sample as you do I tested it on few dozen and the result were good.
Code:
Sub PDFandNumPages()
  
   Dim Folder As Object
   Dim file As Object
   Dim fso As Object
   Dim iExtLen As Integer, iRow As Integer
   Dim sFolder As String, sExt As String
   Dim sPDFName As String

   sExt = "pdf"
   iExtLen = Len(sExt)
   iRow = 1
   ' Must have a '\' at the end of path
   sFolder = "C:\pdf_Directory\"
  
   Set fso = CreateObject("Scripting.FileSystemObject")
  
   If sFolder <> "" Then
      Set Folder = fso.GetFolder(sFolder)
      For Each file In Folder.Files
         If Right(file, iExtLen) = sExt Then
            Cells(iRow, 1).Value = file.Name
            Cells(iRow, 2).Value = pageCount(sFolder & file.Name)
            iRow = iRow + 1
         End If
      Next file
   End If

End Sub
The below code is what gets the number of pages.
Code:
Function pageCount(sFilePathName As String) As Integer

Dim nFileNum As Integer
Dim sInput As String
Dim sNumPages As String
Dim iPosN1 As Integer, iPosN2 As Integer
Dim iPosCount1 As Integer, iPosCount2 As Integer
Dim iEndsearch As Integer

' Get an available file number from the system
nFileNum = FreeFile

'OPEN the PDF file in Binary mode
Open sFilePathName For Binary Lock Read Write As #nFileNum
 
  ' Get the data from the file
  Do Until EOF(nFileNum)
      Input #1, sInput
      sInput = UCase(sInput)
      iPosN1 = InStr(1, sInput, "/N ") + 3
      iPosN2 = InStr(iPosN1, sInput, "/")
      iPosCount1 = InStr(1, sInput, "/COUNT ") + 7
      iPosCount2 = InStr(iPosCount1, sInput, "/")
     
   If iPosN1 > 3 Then
      sNumPages = Mid(sInput, iPosN1, iPosN2 - iPosN1)
      Exit Do
   ElseIf iPosCount1 > 7 Then
      sNumPages = Mid(sInput, iPosCount1, iPosCount2 - iPosCount1)
      Exit Do
   ' Prevent overflow and assigns 0 to number of pages if strings are not in binary
   ElseIf iEndsearch > 1001 Then
      sNumPages = "0"
      Exit Do
   End If
      iEndsearch = iEndsearch + 1
   Loop
  
  ' Close pdf file
  Close #nFileNum
  pageCount = CInt(sNumPages)
 
End Function
The first code search a directory for pdf's but that sub is just one way to do it. If the pdfs are in many different subdirectory I would suggest Chip Pearson's Directory Tree add-in it can generate files with path for all file types or just the extensions you choose.

Best of luck.
Sir I want to count number of pages for the pdf file who's path and file name is given in column "E26". Sir, this function worked fine when pdf file pages where less but when I gave link of pdf who's page count were 49 pages in gave me value ZERO Please help me I am looking for this code for many days
 
Upvote 0
Hi, deepak30
Try code GetPageNum in #10, Apr 23, 2016 or #13, Aug 17, 2018 that I posted here.
Thanks Sir for the reply sir only one issued is there where I have pdf that has 50 pages so when I run that page count gave me ZERO
 
Upvote 0
Oh, you got zero for 50 pages pdf.

I know that there are cases where the function returns zero or the value grater than correct value for PDFs that require a password to view or for PDFs that are edit protected.

Is your 50 pages pdf protected or needs password to browse? If so, I currently have no idea how to resolve it. If not, and if I can get your the pdf, I'd like to find out.
 
Upvote 0
Oh, you got zero for 50 pages pdf.

I know that there are cases where the function returns zero or the value grater than correct value for PDFs that require a password to view or for PDFs that are edit protected.

Is your 50 pages pdf protected or needs password to browse? If so, I currently have no idea how to resolve it. If not, and if I can get your the pdf, I'd like to find out.


Thanks Sir, Sir I got something different in which i got the exact match of pages within a sec. I am sharing the code

Function GetPageNum(PDF_File As String)
Dim FileNum As Long
Dim strRetVal As String
Dim RegExp
Set RegExp = CreateObject("VBscript.RegExp")
RegExp.Global = True
RegExp.Pattern = "/Type\s*/Page[^s]"
FileNum = FreeFile
Open PDF_File For Binary As #FileNum
strRetVal = Space(LOF(FileNum))
Get #FileNum, , strRetVal
Close #FileNum
GetPageNum = RegExp.Execute(strRetVal).Count
Range("e27").Value = GetPageNum
End Function

VBA Code:
Function GetPageNum(PDF_File As String)
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True
    RegExp.Pattern = "/Type\s*/Page[^s]"
    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum
    GetPageNum = RegExp.Execute(strRetVal).Count
    Range("e27").Value = GetPageNum
End Function
 
Upvote 0
Hi, deepak30

The code you showed is something you can find everywhere in Web BBS, and Haluk has already presented in #4.

Searching "/Type\s*/Page[^s]" is basic.
Also this is included in my code.

But, there are cases where the function returns zero or the value grater than correct value for protected PDF.
Some protected PDF did not include the pattern "/Type\s*/Page[^s]". And/Or for some reason, it may include a number that is slightly different from the number of pages.
In that case, the return value will be 0 or a different number.

That's why I added a following pattern matches in an attempt to count on another keyword if basic pattern didn't work.

RegExp.Pattern = "/Resources"
RegExp.Pattern = "/ProcSet\s*\[/PDF"
RegExp.Pattern = "/Type/Catalog/Page\s*"
RegExp.Pattern = "/ExtGState"

Yesterday, I tried to another PDF that couldn't count with my code, I modified it. And I noticed that "Open PDF_File For Binary As #FileNum" creates a zero byte file for a non-existent file, so I added a countermeasure.

Of course, this is not perfect, and some PDFs may have slightly deviated numbers or ridiculous values.
This is a trial version.

VBA Code:
' trial ver
' Seiji Fujita  rev.2  June 02, 2020 (base code: 'Haluk 19/10/2008)
' Added a file size check to prevent a zero byte file from being created if the file does not exist
' When file does not exit or filesize equal 0, return value = NOTEXIST
' When If it is impossible to count, return value = UNSUPPORTNUM
Function GetPageNum(ByVal PDF_File As String) As Long
    Const NOTEXIST As Long = 0
    Const UNSUPPORTNUM As Long = -99
    Dim FileNum As Long
    Dim strRetVal As String
    Dim RegExp
    Dim nFileLen As Long
    Dim getpage0 As Long, getpage1 As Long, getpage2 As Long, _
        getpage3 As Long, getpage4 As Long, getpage5 As Long, _
        getpage6 As Long, getpage7 As Long

    Application.Volatile

    ' return NOTEXIST when filesize equal zero or file does not exist
    On Error Resume Next
    nFileLen = FileLen(PDF_File)
    On Error GoTo 0
    If nFileLen <= 0 Then
        GetPageNum = NOTEXIST
        Exit Function
    End If

    FileNum = FreeFile
    Open PDF_File For Binary As #FileNum
        strRetVal = Space(LOF(FileNum))
        Get #FileNum, , strRetVal
    Close #FileNum

    Set RegExp = CreateObject("VBscript.RegExp")
    RegExp.Global = True

    ' getpage0 is the basics, but depending on the PDF,
    ' the number of this pattern differs from the number of pages
    RegExp.Pattern = "/Type\s*/Page[^s]"
    getpage0 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Resources"
    getpage1 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ProcSet\s*\[/PDF"
    getpage2 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Type/Catalog/Page\s*"
    getpage3 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/ExtGState"
    getpage4 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Type/ObjStm"
    getpage5 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "/Subtype/"
    getpage6 = RegExp.Execute(strRetVal).Count

    RegExp.Pattern = "<rdf:" ' "JFIF"  ' "DeviceRGB"     ' "/Ordering\(Identity"   ' "Registry\(Adobe\)"     ' "/BM/Normal"
    getpage7 = RegExp.Execute(strRetVal).Count

    If getpage0 > 0 Then
        GetPageNum = getpage0
    Else
        If getpage5 > 0 Then
            If getpage7 <= 0 Then
                GetPageNum = getpage6 - getpage5 - getpage4 - 1
            Else
                GetPageNum = UNSUPPORTNUM   ' cannot get right number; getpage6 - getpage7 - 2 * getpage1 ' not logical, adhoc
            End If
        Else
            GetPageNum = MathMax(MathMax(getpage1, getpage2), getpage3)
            ' you can use below with MS Excel, insted above
            ' GetPageNum = Application.WorksheetFunction.Max(getpage1, getpage2, getpage3)
            If GetPageNum > getpage4 Then
                GetPageNum = GetPageNum - getpage4
            End If
        End If
    End If
End Function

Function MathMax(ByVal a As Long, ByVal b As Long) As Long
    If a >= b Then
        MathMax = a
    Else
        MathMax = b
    End If
End Function
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,324
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top