VBA Download PDF

TtheSkyscraper

New Member
Joined
Nov 11, 2015
Messages
17
Is it possible to download a PDF from the link on this page? When you click the page count (11) on this webpage:


Recorded Document Search Detail


it runs a sequence of events and produces a PDF. When you input just the hyperlink:


Maricopa County Recorder


After this link, it goes to:


http://156.42.40.50/UnOfficialDocs2/pdf/19930074944.pdf


And opens the PDF.


Is it possible to have VBA run this whole sequence and download the PDF? I have spent a lot of time researching, but can't find anything and I am not very knowledgeable with VBA. And help would be greatly appreciated. Thank you.
 

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
Try this. Put the code in a module, save the workbook and run Download_PDF. The PDF file is downloaded and saved in the same folder as the workbook.
Code:
Public Sub Download_PDF()

    Dim baseURL As String, searchResultsURL As String, pdfURL As String, PDFdownloadURL As String
    Dim httpReq As Object
    Dim HTMLdoc As Object
    Dim PDFlink As Object
    Dim cookie As String
    Dim downloadFolder As String, localFile As String
    Dim fileNum As Integer, fileBytes() As Byte

    'Folder in which the downloaded file will be saved
    
    downloadFolder = ThisWorkbook.Path
    If Right(downloadFolder, 1) <> "\" Then downloadFolder = downloadFolder & "\"
    
    baseURL = "http://recorder.maricopa.gov/recdocdata/"
    searchResultsURL = baseURL & "GetRecDataDetail.aspx?rec=19930074944&suf=&bdt=1/1/1947&edt=11/18/2016&nm=&doc1=&doc2=&doc3=&doc4=&doc5="
        
    Set httpReq = CreateObject("WinHttp.WinHttpRequest.5.1")
    
    With httpReq
        
        'Send GET to request search results page
        
        .Open "GET", searchResultsURL, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
        .Send
        cookie = .getResponseHeader("Set-Cookie")
        
        'Put response in HTMLDocument for parsing
        Set HTMLdoc = CreateObject("HTMLfile")
        HTMLdoc.body.innerHTML = .responseText
    
        'Get PDF URL from pages link
        '< a id="ctl00_ContentPlaceHolder1_lnkPages" title="Click to view unofficial document"
        ' href="unofficialpdfdocs.aspx?rec=19930074944&pg=1&cls=RecorderDocuments&suf=" target="_blank">11< /a>
    
        Set PDFlink = HTMLdoc.getElementById("ctl00_ContentPlaceHolder1_lnkPages")
        pdfURL = Replace(PDFlink.href, "about:", baseURL)
        
        'Send GET request to the PDF URL with automatic http redirects disabled.  This returns a http 302 status (Found) with the Location header containing the URL of the PDF file
    
        .Open "GET", pdfURL, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0"
        .setRequestHeader "Referer", searchResultsURL
        .setRequestHeader "Set-Cookie", cookie
        .Option(WinHttpRequestOption_EnableRedirects) = False
        .Send
        PDFdownloadURL = .getResponseHeader("Location")

        'Send GET to request the PDF file download
    
        .Open "GET", PDFdownloadURL, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/46.0"
        .setRequestHeader "Referer", pdfURL
        .Send
        
        'Save response bytes in the local file
        
        If .Status = 200 Then
            localFile = downloadFolder & Mid(PDFdownloadURL, InStrRev(PDFdownloadURL, "/") + 1)
            If Dir(localFile) <> "" Then Kill localFile
            fileBytes = .responseBody
            fileNum = FreeFile
            Open localFile For Binary Access Write As #fileNum
            Put #fileNum, 1, fileBytes
            Close #fileNum
            MsgBox "Downloaded " & localFile
        Else
            MsgBox "Download failed " & .statusText
        End If
        
    End With
    
End Sub<a id="ctl00_ContentPlaceHolder1_lnkPages" title="Click to view unofficial document"
 
Upvote 0
Hi John,

Thank you for the code. However, I'm getting an error:

Run-time error '2147012746 (80072f76)':

The requested header was not found

Debugging it refers to:

PDFdownloadURL = .getResponseHeader("Location")

I'm trying to figure out the problem now. If you have any idea, it'd be greatly appreciated. Thanks again!
 
Upvote 0
Code:
Sub DownloadPDF()


    Dim html$, pdf_link$, cookie$, user_agent$
    Dim base_url, request_URL$, pdf_url$, referer$, pdf_file_path$
    Dim strm As Object, req As Object, re As Object, mc As Object


    Set req = CreateObject("WinHttp.WinHttpRequest.5.1")
    Set re = CreateObject("VBScript.RegExp")
    Set strm = CreateObject("ADODB.Stream")
    
    pdf_file_path = ThisWorkbook.Path & "\PDF_FILE.pdf"
    
    re.IgnoreCase = True
    re.Pattern = ""


    
    base_url = "http://recorder.maricopa.gov/recdocdata/"
    request_URL = base_url & "GetRecDataDetail.aspx?rec=19930074944&suf=&bdt=1/1/1947&edt=11/18/2016&nm=&doc1=&doc2=&doc3=&doc4=&doc5="
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2896.3 Safari/537.36 OPR/43.0.2412.0 (Edition developer)"
    
    
    req.Open "GET", request_URL, False
    req.SetRequestHeader "User-Agent", user_agent
    req.Send
    cookie = req.GetResponseHeader("Set-Cookie")
    html = req.ResponseText
    
    Set mc = re.Execute(html)


    pdf_link = mc(0).SubMatches(0)
    pdf_url = base_url & pdf_link
    req.Open "GET", pdf_url, False
    req.SetRequestHeader "Cookie", cookie
    req.SetRequestHeader "User-Agent", user_agent
    req.SetRequestHeader "Referer", request_URL
    req.Send


    On Error Resume Next
    Kill pdf_file_path
    On Error GoTo 0
    
    strm.Type = adTypeBinary
    strm.Open
    strm.Write req.ResponseBody
    strm.SaveToFile ThisWorkbook.Path & "\PDF_FILE.pdf"
    strm.Close


    MsgBox "Well done!", vbInformation


End Sub
 
Last edited:
Upvote 0
Hi Sektor,

Thanks for the help! I'm getting an error with "pdf_link = mc(0).SubMatches(0)" it Says "Run-time error '5': Invalid procedure call or argument"

Thanks again. I'll see what I can do.
 
Upvote 0
Hi John,

I changed the error: PDFdownloadURL = .getResponseHeader("Location")

to: PDFdownloadURL = .getAllResponseHeaders("Location")

And now it gets a run time error on the following line:

.Open "GET", PDFdownloadURL, False

Stating "The URL does not use a recognized protocol"

I'll keep digging. Thanks again!
 
Upvote 0
The forum doesn't want to show re.Pattern property correctly. Here's the .bas file with code.
 
Upvote 0
Thanks Sektor,

I'm now getting an error with:

strm.Type = adTypeBinary

It says it's a compile error and the variable is not defined.

I'm looking into it, no luck so far. Thanks again.
 
Upvote 0
Hi John,

Thank you for the code. However, I'm getting an error:

Run-time error '2147012746 (80072f76)':

The requested header was not found

Debugging it refers to:

PDFdownloadURL = .getResponseHeader("Location")

I'm trying to figure out the problem now. If you have any idea, it'd be greatly appreciated. Thanks again!
I posted the late binding version of my code which requires the following line immediately below the Dim statements:
Code:
    Const WinHttpRequestOption_EnableRedirects = 6
 
Upvote 0
Hi John,

I changed the error: PDFdownloadURL = .getResponseHeader("Location")

to: PDFdownloadURL = .getAllResponseHeaders("Location")

And now it gets a run time error on the following line:

.Open "GET", PDFdownloadURL, False

Stating "The URL does not use a recognized protocol"

I'll keep digging. Thanks again!
I'm not surprised you get that error because assigning all the response headers to the next URL definitely isn't correct.
 
Upvote 0

Forum statistics

Threads
1,223,903
Messages
6,175,289
Members
452,631
Latest member
a_potato

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top