Spliting pdf file by Mb size

prati

Board Regular
Joined
Jan 25, 2021
Messages
51
Office Version
  1. 2019
Platform
  1. Windows
Hey friends,
I wonder if there is a way to split pdf file through VBA into multiple pdf files so that each part will not be larger than 8Mb

Im using PDFtk server to split pdf file by defining the exact rang of pages. It splits the file into multiple files but doesn't take into account the file size

1621269802959.png

if FileLen(File1) / 1000000 > 8 then 'this check if the file is larger than 8mb
if totalPages < 100 then 'check the total page numbers
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-end output C:\Temp\newfilePart2.pdf"""), 0, True
Elseif totalPages < 150 then
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 1-50 output C:\Temp\newfilePart1.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 51-100 output C:\Temp\newfilePart2.pdf"""), 0, True
Wsh.Run ("cmd /c PDFtk " & "C:\Temp\oldfile.pdf" & " cat 101-end output C:\Temp\newfilePart3.pdf"""), 0, True
Elseif totalPages < 200 then
....
Elseif totalPages < 250 then
...
and so on.

The code above dosen't help me because it split the files by number of pages and not by the size of the file as i want it to..

I found a program caled UnityPdf - it is a free program that can do the job easily, but i don't know if there is a way to write a vba command using UnityPdf in order to split the pdf file

View attachment 38890
 
You should get at least 4 debug lines with the latest code - 2 DEL lines, the burst command and 1 DEL line.

Have you installed PDFtk Server?

Try stepping through the code with the F8 key in the VBA editor.
 
Upvote 0

Excel Facts

Which Excel functions can ignore hidden rows?
The SUBTOTAL and AGGREGATE functions ignore hidden rows. AGGREGATE can also exclude error cells and more.
I'm glad it works for you - thanks for your kind words.

Here's an improved version which deletes the _Part_nnn.pdf and _Page_nnn.pdf files at the start and deletes all the _Page_nnn.pdf files at the end, rather than the set of _Page_nnn.pdf files for each catenated Part. It also sets the input folder (using the DOS CD /D "C:\path\to\" command) once per Part, rather than including it for every _Page_nnn.pdf file, which results in much shorter command lines.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
   
    'PDF file to be split into multiple parts
   
    PDFinputFile = "C:\path\to\file.pdf"
   
    'Maximum size of each part in kilobytes
   
    maxFileSizeKB = 2048
   
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
   
    If Dir(PDFinputFile) <> vbNullString Then
   
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))
       
        'Delete existing _Page_nnn.pdf and _Part_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Part_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
       
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
       
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
       
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
       
        'Delete all _Page_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Delete doc_data.txt file created by PDFtk burst command
       
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
       
        MsgBox "Done"
                   
    Else
   
        MsgBox "Error opening PDF file " & PDFinputFile
   
    End If
   
End Sub
Hey

The code works perfect.
Do you have an idea why sometimes the original file is splitted into too much parts.

Most of the time it works very well indeed.
For example, when I want to split a file of 25mb into parts of 9 mb each
The code split the file into 3 parts
First file- 9mb, Second file- 9mb, Third file- 7mb
That's wonderful.

However, there are other cases of too much parts.
For example, I tried to split a file size of 6mb and choose to split it into parts of 5mb for each part.
The code spilt the file to 4 parts
First file- 5mb, Second file- 5mb, Third file- 5mb, fourth file- 3mb

On one hand, the code indeed split the file for maximum 5mb like it should, but on the other hand, it is too much parts.

I guess maybe bursting / splitting the file into separate pages (1 page each file) produces large files compared to the original combined/merge file,
and therefore, at the part the code doing "merge", it actually merging large separate files, and the result sometimes it unreasonable (too much) parts.
 
Upvote 0
Do you have an idea why sometimes the original file is splitted into too much parts.
Have you applied the bug fix in this post?


If so, step through the code with F8 and ensure it is working correctly. If you don't find a problem, post a link to the file which causes the problem.
 
Upvote 0
Hey,
Sorry for the late response.
I have applied the fix, but still get the same problem with some files.

Here is an example of 10 mb file named A.pdf

If I try to spilt the file to parts that each part is 7 mb, I expect that the result should be:

First part will be around 7mb
Second part will be around 3 mb
The total parts will be around 10 mb

However, even after i have applied the fix, it produces 4 parts

First part 5mb
Second part 6mb
Third part 4mb
Fourth part 4mb
The total parts are 19.5 mb

What could be the problem?
Why the result is so diffrent from what i expected?

1631956524792.png
 
Upvote 0
So you think there should be 2 parts, however there are actually 4 parts. Is that the problem?

I think 4 parts is correct for the 12 pages and a maximum part size of 7 MB (7,168 KB):

Part 1: 19 + 5,232 + 19 = 5,270 KB
Part 2: 5,233 + 20 + 778 = 6,031 KB
Part 3: 4,319 + 20 + 19 + 12 = 4,352 KB
Part 4: 4,320 + 17 = 4,337 KB
 
Upvote 0
So you think there should be 2 parts, however there are actually 4 parts. Is that the problem?

I think 4 parts is correct for the 12 pages and a maximum part size of 7 MB (7,168 KB):

Part 1: 19 + 5,232 + 19 = 5,270 KB
Part 2: 5,233 + 20 + 778 = 6,031 KB
Part 3: 4,319 + 20 + 19 + 12 = 4,352 KB
Part 4: 4,320 + 17 = 4,337 KB
Yeah...
The question is why page 2 is 5mb,
page 4 is 5mb
page 7 is 4 mb
page 11 is 4 mb

In another words what I'm asking is why the sum of the 12 seperate pages is 19.5 mb, when the original file which is only 10mb....

Maybe now I managed to clarify the question.

The code you wrote is perfect. There is nothing wrong with your code. The question is why Pdftk create big seperate pages, so much big that the total 12 seprate pages are 19.5mb, when the original file which is only 10mb
 
Upvote 0
So you think there should be 2 parts, however there are actually 4 parts. Is that the problem?

I think 4 parts is correct for the 12 pages and a maximum part size of 7 MB (7,168 KB):

Part 1: 19 + 5,232 + 19 = 5,270 KB
Part 2: 5,233 + 20 + 778 = 6,031 KB
Part 3: 4,319 + 20 + 19 + 12 = 4,352 KB
Part 4: 4,320 + 17 = 4,337 KB
Thanks,
You are a true MVP
 
Upvote 0
I'm glad it works for you - thanks for your kind words.

Here's an improved version which deletes the _Part_nnn.pdf and _Page_nnn.pdf files at the start and deletes all the _Page_nnn.pdf files at the end, rather than the set of _Page_nnn.pdf files for each catenated Part. It also sets the input folder (using the DOS CD /D "C:\path\to\" command) once per Part, rather than including it for every _Page_nnn.pdf file, which results in much shorter command lines.
VBA Code:
Option Explicit

Const Q As String = """"

Public Sub PDFtk_Split_PDF_By_Size()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFile As String, PDFfolder As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
   
    'PDF file to be split into multiple parts
   
    PDFinputFile = "C:\path\to\file.pdf"
   
    'Maximum size of each part in kilobytes
   
    maxFileSizeKB = 2048
   
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
   
    If Dir(PDFinputFile) <> vbNullString Then
   
        PDFfolder = Left(PDFinputFile, InStrRev(PDFinputFile, "\"))
       
        'Delete existing _Page_nnn.pdf and _Part_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Part_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True

        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = "cmd /c PDFtk " & Q & PDFinputFile & Q & " burst output " & Q & Replace(PDFinputFile, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
       
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFile, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    part = part + 1
                    command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                    Debug.Print Time; command
                    Wsh.Run command, 0, True
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
       
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
       
        If pageFiles <> "" Then
            part = part + 1
            command = "cmd /c CD /D " & Q & PDFfolder & Q & " & PDFtk " & pageFiles & "cat output " & Q & Replace(PDFinputFile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print Time; command
            Wsh.Run command, 0, True
        End If
       
        'Delete all _Page_nnn.pdf files for the input file
       
        command = "cmd /c DEL " & Q & Replace(PDFinputFile, ".pdf", "_Page_*.pdf") & Q
        Debug.Print Time; command
        Wsh.Run command, 0, True
       
        'Delete doc_data.txt file created by PDFtk burst command
       
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
       
        MsgBox "Done"
                   
    Else
   
        MsgBox "Error opening PDF file " & PDFinputFile
   
    End If
   
End Sub
Hey,
Can you please help me change the perfect code you wrote in a way that instead of running pdftkserver from the command line location cmd /c PDFtk,
the macro will run from my personal folder:
C:\PdfPrograms\pdftk.exe
1650121408404.png



I have uninstall pdftkserver and put the pdftk.exe in specific location and I want to use it directly from that location C:\PdfPrograms\pdftk.exe.

I have successfully made change in minor macros
For example when I had to deal with a macro that reverse the pages of a pdf file l had the below lines of code:

Set wsh = CreateObject("WScript.Shell")
s = "cmd /c PDFtk " & "C:\TempArticles\food.pdf" & " cat end-1" & " output " & "C:\TempArticles\food_reverse_pages.pdf"
'Debug.Print 3, s
wsh.Run s, 0, True

I have successfully changed the linde to the below and it work indeed:

Set wsh = CreateObject("WScript.Shell")
s = "C:\PdfPrograms\pdftk.exe " & "C:\TempArticles\food.pdf" & " cat end-1" & " output " & "C:\TempArticles\food_reverse_pages.pdf"
'Debug.Print 3, s
wsh.Run s, 0, True

However, I don't know how to change your pefect code that splits a pdf file by mb size
 
Upvote 0
Can you please help me change the perfect code you wrote in a way that instead of running pdftkserver from the command line location cmd /c PDFtk,
the macro will run from my personal folder:
C:\PdfPrograms\pdftk.exe
1650121408404.png



I have uninstall pdftkserver and put the pdftk.exe in specific location and I want to use it directly from that location C:\PdfPrograms\pdftk.exe.

If you've properly installed PDFtk Server in a specific location (and not just copied pdftk.exe there) then as part of the installation steps it asks if you want to add the installation folder to the path environment variable. If you answered 'Yes' to that question then the code should work exactly as before without any changes.

As a test, I haven't installed PDFtk Server, but have changed the code to specify the full path to pdftk.exe as "C:\Program Files (x86)\PDFtk Server\bin\PDFtk.exe". Therefore change "C:\Program Files (x86)\PDFtk Server\bin\PDFtk.exe" to "C:\PdfPrograms\pdftk.exe" in the following code and see if it works for you.

VBA Code:
Public Sub PDFtk_Split_PDF_By_Size2()

    Dim Wsh As Object 'WshShell
    Dim command As String
    Dim PDFinputFullName As String, PDFfolder As String, PDFfile As String
    Dim maxFileSizeKB As Long
    Dim pageFile As String
    Dim page As Long
    Dim totalFileSizeKB As Single, thisFileSizeKB As Single
    Dim pageFiles As String
    Dim part As Long
   
    'PDF file to be split into multiple parts
   
    PDFinputFullName = "C:\path\to\file.pdf"
   
    'Maximum size of each part in kilobytes
   
    maxFileSizeKB = 2048
   
    Set Wsh = CreateObject("WScript.Shell")  'New WshShell
   
    If Dir(PDFinputFullName) <> vbNullString Then
   
        PDFfolder = Left(PDFinputFullName, InStrRev(PDFinputFullName, "\"))
        PDFfile = Mid(PDFinputFullName, InStrRev(PDFinputFullName, "\") + 1)
       
        'Delete existing _Page_nnn.pdf and _Part_nnn.pdf files for the input file
       
        command = "DEL " & Q & Replace(PDFinputFullName, ".pdf", "_Page_*.pdf") & Q
        Debug.Print command
        Wsh.Run "cmd /c " & command, 0, True
        command = "DEL " & Q & Replace(PDFinputFullName, ".pdf", "_Part_*.pdf") & Q
        Debug.Print command
        Wsh.Run "cmd /c " & command, 0, True
       
        'Run PDFtk burst command to create multiple _Page_nnn.pdf files, one for each page in the input PDF

        command = Q & "C:\Program Files (x86)\PDFtk Server\bin\PDFtk.exe" & Q & " " & Q & PDFinputFullName & Q & " burst output " & Q & Replace(PDFinputFullName, ".pdf", "_Page_%03d.pdf") & Q
        Debug.Print command
        Wsh.Run command, 0, True
       
        'Loop through the _Page_nnn.pdf files in order and create _Part_nnn.pdf files whose size is less than the maximum file size.
       
        totalFileSizeKB = 0
        pageFiles = ""
        page = 0
        part = 0
        Do
            page = page + 1
            'Get the next _Page_nnn.pdf file
            pageFile = Dir(Replace(PDFinputFullName, ".pdf", "_Page_" & Format(page, "000") & ".pdf"))
            If pageFile <> vbNullString Then
                thisFileSizeKB = FileLen(PDFfolder & pageFile) / 1024
                Debug.Print pageFile; " size " & thisFileSizeKB, "total size " & totalFileSizeKB
                'Is this PDF page file size plus the current total file size less than the maximum file size?
                If totalFileSizeKB + thisFileSizeKB <= maxFileSizeKB Then
                    'Yes, so add this PDF page file to the string of files and increment the current total file size
                    pageFiles = pageFiles & Q & pageFile & Q & " "
                    totalFileSizeKB = totalFileSizeKB + thisFileSizeKB
                Else
                    'No, so run PDFtk cat command to catenate the current PDF page files to the next PDF file named _Part_nnn.pdf
                    If pageFiles <> "" Then
                        part = part + 1
                        command = "CD /D " & Q & PDFfolder & Q & " & " & Q & "C:\Program Files (x86)\PDFtk Server\bin\PDFtk.exe" & Q & " " & pageFiles & "cat output " & Q & Replace(PDFfile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
                        Debug.Print command
                        Wsh.Run "cmd /c " & command, 0, True
                    End If
                    'Initialise the PDF page files with this PDF file and the total file size
                    pageFiles = Q & pageFile & Q & " "
                    totalFileSizeKB = thisFileSizeKB
                End If
            End If
        Loop Until pageFile = vbNullString
       
        'If the current PDF page files isn't empty then run PDFtk cat command to catenate them to the next PDF file named _Part_nnn.pdf
       
        If pageFiles <> "" Then
            part = part + 1
            command = "CD /D " & Q & PDFfolder & Q & " & " & Q & "C:\Program Files (x86)\PDFtk Server\bin\PDFtk.exe" & Q & " " & pageFiles & "cat output " & Q & Replace(PDFfile, ".pdf", "_Part_" & Format(part, "000") & ".pdf") & Q
            Debug.Print command
            Wsh.Run "cmd /c " & command, 0, True
        End If
       
        'Delete all _Page_nnn.pdf files for the input file created by PDFtk burst command above
       
        command = "DEL " & Q & Replace(PDFinputFullName, ".pdf", "_Page_*.pdf") & Q
        Debug.Print command
        Wsh.Run "cmd /c " & command, 0, True
       
        'Delete doc_data.txt file created by PDFtk burst command above
       
        If Dir(PDFfolder & "doc_data.txt") <> vbNullString Then Kill PDFfolder & "doc_data.txt"
       
        MsgBox "Done"
                   
    Else
   
        MsgBox "Error opening PDF file " & PDFinputFullName
   
    End If
   
End Sub
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,824
Messages
6,181,187
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top