Capture data from HTML Source and download

whose2know

Board Regular
Joined
May 1, 2002
Messages
59
I'm trying to automate a process for downloading PDF files from a website that uses unique URLs. I thought source code in each unique URL contained the data needed to find the PDF to download, but apparently it does not. So, I need to click the PDF icon and save the PDF image to my harddrive (and continue to the next one via a loop).

here's an initial site with the PDF icon: http://www.nalpdirectory.com/dledir_search_results.asp?fscid=F087901&yr=2009&orgtypeid=F

The source code that is related to the PDF icon is in row 112-115 (I can't paste it here because mrexcel does not recognize I am pasting as text and not HTML).

If you click on the PDF icon for the above URL, it will open up a new URL with the PDF, which is unique: http://www.nalpdirectory.com/forms/merged/F087901_2008_1260899211178.pdf

I know how to capture table data from websites, but not how to follow a javascript link via a button and then download the corresponding file. Any help is appreciated. Here are additional related starting URLs:
http://www.nalpdirectory.com/dledir_search_results.asp?fscid=F390801&yr=2009&orgtypeid=F

http://www.nalpdirectory.com/dledir_search_results.asp?fscid=F325603&yr=2009&orgtypeid=F

Thanks in advance for your assistance
 
Here's the portion of the source code that refers to what to do if the user clicks the PDF image:


HTML:
         <td>
            <!--<a href="merged_forms/F460401_2006_1129017176489.pdf" **********="MM_swapImgRestore();" ***********="MM_swapImage('icon_sl_text','','images/icon_print_pdf_text.gif',1);" >-->
            <a href="javascript:uf_OpenWindow('blank.asp','pdfform','status=yes,scrollbars=no,resizable=yes,width=600,height=400', 'dledir_search_results_pdfform.asp')" **********="MM_swapImgRestore();" ***********="MM_swapImage('icon_sl_text','','images/icon_print_pdf_text.gif',1);">
            <img name="icon_sl_print_pdf" src="images/icon_adobe.gif" width="24" height="27" border="0" alt="Print PDF Report"></a>
          </td>
 
Upvote 0
I'm sure there is a more concise way to do this but I was bored and gave it a shot with what I know...

Change this constant in the code to reflect the folder that will contain your downloaded pdf's.

Private Const LocalDownloadFolder = "C:\Users\TJS\Desktop\my pdfs\"

The code, without editing, depends on the URLs being listed consecutively in range a1 to a?. The code goes in the same worksheet that contains the URLs.

Here is a working example containing the three example URLs that you posted above. Click on the button.

http://cid-ea73b3a00e16f94f.skydrive.live.com/self.aspx/Mr Excel Example/2154536.zip

Code:
Option Explicit
 
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
 
Private Const LocalDownloadFolder = "C:\Users\TJS\Desktop\my pdfs\"
 
Private WithEvents IE As InternetExplorer
Private WithEvents Download As InternetExplorer
 
Private CurrentRange As Range
Private GetLink As Integer
 
Private Sub CommandButton1_Click()
    StartHere
End Sub
 
Sub StartHere()
    If CurrentRange Is Nothing Then Set CurrentRange = Range("A1")
    CallThis CurrentRange.Text
    GetLink = 0
End Sub
 
Private Sub CallThis(URL As String)
    If IE Is Nothing Then
        Set IE = New InternetExplorer
    End If
    IE.Navigate URL
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    IE.Navigate "javascript:uf_OpenWindow('blank.asp','pdfform','status=yes,scrollbars=no,resizable=yes,width=600,height=400', 'dledir_search_results_pdfform.asp')"
End Sub
 
Private Sub IE_NewWindow2(ppDisp As Object, Cancel As Boolean)
    Set Download = New InternetExplorer
    Set ppDisp = Download
    CurrentRange.Offset(, 1) = "Loading..."
End Sub
 
Private Sub Download_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
    If GetLink > 1 Then
        CurrentRange.Offset(, 1) = URL
        CurrentRange.Offset(, 2) = "Downloading..."
        DownloadUrls URL
        Set CurrentRange = CurrentRange.Offset(1)
        Download.Quit
        Set Download = Nothing
        If CurrentRange <> "" Then
            StartHere
            Exit Sub
        Else
            IE.Quit
            Exit Sub
        End If
    End If
    GetLink = GetLink + 1
End Sub
 
Private Sub Worksheet_Change(ByVal Target As Range)
    If Not Intersect(Sheets("Sheet1").Range("A1:A3"), Target) Is Nothing Then
        Set CurrentRange = CurrentRange.Offset(1)
        If CurrentRange <> "" Then StartHere
    End If
End Sub
 
Private Sub DownloadUrls(URL)
    Dim LocalFilename As String
    
    LocalFilename = LocalDownloadFolder & Mid(URL, InStrRev(URL, "/") + 1)
    CurrentRange.Offset(, 2) = IIf(URLDownloadToFile(0, URL, LocalFilename, 0, 0) = 0, "OK", "FAILED")
End Sub
 
Upvote 0
RESOLVED! Capture data from HTML Source and download

AWESOME THOMAS!

Wow, with only 5 posts you're new to the site but were able to knock this out of the park! Thanks for your quick response and your help. I'm able to modify the code slightly to change filenames and things of that sort, but this was one of the most complete responses I've seen.

Cheers!
Scott
 
Upvote 0
Re: RESOLVED! Capture data from HTML Source and download

Glad it worked for you. I used to be a member here but it has been so long since I've posted anything that I forgot my username. Found it. It's my name and I forgot that! *smiles* Have a good one...

BTW, if you did not notice, the Worksheet_Change event code serves no purpose. I forgot to remove it before posting.
 
Upvote 0
Tom,
You had me fooled for a minute...I've never seen a newbie provide such a quick and thorough answer...its always the diehard Excel gurus. Although gurus do have to start out as newbies!

And yes, I did modify the Worksheet_Change code as it wanted to run the macro anytime I was "playing with the range".

Thanks again,
Scott
 
Last edited:
Upvote 0
Tom, I'm hoping you can still help out. I've never been successful either with your file/code or modified of running more than 3 rows. I've modified the code as follows (and show your previous values REMmed out:

Code:
Option Explicit

Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Private Const LocalDownloadFolder = "C:\Documents and Settings\U0212992\My Documents\NALP PDFs\"

Private WithEvents IE As InternetExplorer
Private WithEvents Download As InternetExplorer

Public FromRange As Integer
Public ToRange As Integer
Public FN As String
Public Yr As String
Public URLText As String
Public Counter As Integer
Public Pull As String
Public Response As String
Private CurrentRange As Range
Private GetLink As Integer


Private Sub CommandButton1_Click()
    StartHere
End Sub

Sub StartHere()
    Let FromRange = Range("B1")
    Let ToRange = Range("B2")
    For Counter = FromRange To ToRange
    
    Let Pull = Range("D" & Counter)
    Let URLText = Range("A" & Counter)
    Let FN = Range("E" & Counter)
    Let Yr = Range("F" & Counter)
    GetLink = 0
    If Pull = "Y" Then
    'If CurrentRange Is Nothing Then Set CurrentRange = Range("A1")
        CallThis URLText 'CurrentRange.Text
        GetLink = 0
    End If
    Next Counter
End Sub

Private Sub CallThis(URL As String)
    
    If IE Is Nothing Then
        Set IE = New InternetExplorer
    End If
    IE.Navigate URL
    Do Until IE.ReadyState = READYSTATE_COMPLETE: Loop
    IE.Navigate "javascript:uf_OpenWindow('blank.asp','pdfform','status=yes,scrollbars=no,resizable=yes,width=600,height=400', 'dledir_search_results_pdfform.asp')"
End Sub

Private Sub IE_NewWindow2(ppDisp As Object, Cancel As Boolean)
    Set Download = New InternetExplorer
    Set ppDisp = Download
    Range("B" & Counter) = "Loading..." 'CurrentRange.Offset(, 1)
End Sub

Private Sub Download_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
    If GetLink > 1 Then
        Range("B" & Counter) = URL  'CurrentRange.Offset(, 1)
        Range("C" & Counter) = "Downloading..." 'CurrentRange.Offset(, 2)
        DownloadUrls URL
        'Set CurrentRange = CurrentRange.Offset(1)
        Download.Quit
        Set Download = Nothing
        'If CurrentRange <> "" Then
        '    StartHere
        '    Exit Sub
        'Else
            'IE.Quit
            ''Exit Sub
        'End If
    End If
    GetLink = GetLink + 1
End Sub


Private Sub DownloadUrls(URL)
    Dim LocalFilename As String
    LocalFilename = LocalDownloadFolder & FN & "-" & Yr & ".pdf"  'Mid(URL, InStrRev(URL, "/") + 1)
    Range("C" & Counter) = IIf(URLDownloadToFile(0, URL, LocalFilename, 0, 0) = 0, "OK", "FAILED") 'CurrentRange.Offset(, 2)
End Sub


Here's a link to the modified file that has 15 URLs listed in it and the modified code above:
http://www.speedyshare.com/files/19833315/Modified_Loop_2154536.xls

Any ideas on how what I'm doing wrong?
 
Upvote 0
The flow is hackish to begin with but you were forcing actions by way of a loop without allowing the events to fire in their proper sequence. This is still a hack but it's definitely better than what I posted initially. Replace all of your code with the following in your current file with the structure remaining as is or you can download the edited version that I tested successfully.

http://cid-ea73b3a00e16f94f.skydrive.live.com/self.aspx/Mr Excel Example/2154536-2.zip

Code:
Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Private Declare Function ShowWindow Lib "user32" (ByVal hwnd As Long, ByVal nCmdShow As Long) As Long
 
Private Const SW_HIDE As Long = 0
Private Const LocalDownloadFolder = "C:\Documents and Settings\U0212992\My Documents\NALP PDFs\"
 
Private WithEvents IE As InternetExplorer
Private WithEvents Download As InternetExplorer
 
Private CurrentRange As Range
 
'start here
Private Sub CommandButton1_Click()
    Set CurrentRange = Range("A5")
    GetNextPull CurrentRange
End Sub
 
Private Sub GetNextPull(BeginningTarget As Range)
    On Error Resume Next
    Download.Quit: IE.Quit
    Set Download = Nothing: Set IE = Nothing
    On Error GoTo 0
 
    Do Until BeginningTarget.Offset(, 3) = "Y" And BeginningTarget.Offset(, 2) <> "OK"
        Set BeginningTarget = BeginningTarget.Offset(1)
        If BeginningTarget = "" Then
            Set CurrentRange = Nothing
            Exit Sub
        End If
    Loop
    Set CurrentRange = BeginningTarget
    CallThis CurrentRange.Text
End Sub
 
Private Sub CallThis(URL As String)
    Set IE = New InternetExplorer
    CurrentRange.Offset(, 1) = "Loading 1 of 2..."
    IE.Navigate URL
    Do Until IE.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
    IE.Navigate "javascript:uf_OpenWindow('blank.asp','pdfform','status=yes,scrollbars=no,resizable=yes,width=600,height=400', 'dledir_search_results_pdfform.asp')"
End Sub
 
Private Sub Download_BeforeNavigate2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
    ShowWindow Download.hwnd, SW_HIDE
End Sub
 
Private Sub IE_NewWindow2(ppDisp As Object, Cancel As Boolean)
    On Error Resume Next
    Download.Quit
    Set Download = Nothing
    On Error GoTo 0
    Set Download = New InternetExplorer
    Set ppDisp = Download
    ppDisp.Visible = False
    CurrentRange.Offset(, 1) = "    Loading 2 of 2..."
End Sub
 
Private Sub Download_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    If InStr(URL, ".pdf") Then
        CurrentRange.Offset(, 1) = URL
        CurrentRange.Offset(, 2) = "Downloading..."
        DownloadUrls URL
        GetNextPull CurrentRange.Offset(1)
    End If
End Sub
 
Private Sub DownloadUrls(URL)
    Dim LocalFilename As String
 
    LocalFilename = LocalDownloadFolder & Mid(URL, InStrRev(URL, "/") + 1)
    CurrentRange.Offset(, 2) = IIf(URLDownloadToFile(0, URL, LocalFilename, 0, 0) = 0, "OK", "FAILED")
End Sub
 
Upvote 0
Great...thanks again Tom. This gets the job done. I was having trouble rewriting the code since I couldn't step through the functions. I usually know just enough to get the job done...but in this case, not even that.

Great support and timeliness.

Regards,
Scott
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top