IE automation: saving file from website

cfsoule

New Member
Joined
Jun 27, 2017
Messages
5
Hello,

Long time creeper, first time poster!

I've seen similar questions about IE automation on this board and need help adapting code I have found to my situation.

I have code that currently opens IE, navigates to the desired URL, interacts with the form by clicking checkboxes that fit my needs, and clicks a submit button. I then had code that would save the file locally. Overall this code was very dependent on Sleep and wait commands. I found code that looked more efficient on this thread:


https://www.mrexcel.com/forum/excel...n-internet-explorer-web-site.html#post3404965

I need help adapting the code, because although it saves a file, it does not save what I need (the csv file). It saves the HTML source code.

Note: the piece of the url that needs to be able to change is the n_grpclm_id depending on the query

Here's my current code:

Code:
Option Explicit
Public Declare Function DeleteUrlCacheEntry Lib "Wininet.dll" Alias "DeleteUrlCacheEntryA" _
    (ByVal lpszUrlName As String) As Long
    
Public Sub XMLhttp_Download_File()
    Dim URL As String
    Dim httpReq As xmlHTTP
    Dim POSTdata As String
    Dim HTMLdoc As HTMLDocument
    Dim ViewStateInput As Object, EventValidationInput As Object
    Dim ADODBstream As Object
    Dim localFile As String
    
    'The file name which the downloaded file will be saved as
    
    localFile = "local file" 'local file here
    
    URL = "[URL]http://URL[/URL] link.aspx" 'http url ending in .aspx
    DeleteUrlCacheEntry URL
 
    Set httpReq = New xmlHTTP
    
    With httpReq
        .Open "GET", URL, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322; .NET4.0C; .NET4.0E)"
        .send
         
        'Put response in a HTMLDocument
        
        Set HTMLdoc = New HTMLDocument
        HTMLdoc.body.innerHTML = .responseText
    End With
    
    'Construct the POST data string with the form parameters
    
    POSTdata = "?processind=y&extractind=y&excellinkformat=y&getreserveind=A&getfininfo=y&n_clmgrp_id=6CFEC0A1C45BF6A2&n_clmgrp_id=79A1C19077AF9EFC&sourcesystem=all&submit=get+claims"
    
    'Send POST data to request file
    With httpReq
        .Open "POST", URL, False
        .setRequestHeader "Content-Type", "text/html; charset=utf-8"
        .setRequestHeader "Content-Length", Len(POSTdata)
        .setRequestHeader "Cache-Control", "private"
        .send POSTdata
         
        'Save response in local file
        
        If .Status = 200 Then
            Set ADODBstream = CreateObject("ADODB.Stream")
            With ADODBstream
                .Type = 1 'adTypeBinary
                .Open
                .write httpReq.responseBody
                .SaveToFile localFile, 2
                .Close
            End With
       End If
    End With
    
End Sub
[CODE]
 

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
That code includes hidden input elements such as __VIEWSTATE and __EVENTVALIDATION in the form data string sent in the POST request. With your page also being a .aspx page, I suspect you will also need to include these parameters and maybe other hidden input elements. They can be obtained by parsing the HTMLDocument from the initial GET request.

Note: the piece of the url that needs to be able to change is the n_grpclm_id depending on the query

Code:
POSTdata = "?processind=y&extractind=y&excellinkformat=y&getreserveind=A&getfininfo=y&n_clmgrp_id=6CFEC0A1C45BF6A2&n_clmgrp_id=79A1C19077AF9EFC&sourcesystem=all&submit=get+claims"
n_grpclm_id or n_clmgrp_id?

And you have two n_clmgrp_id parameters in the above string. The long hex number (6CFEC0A1C45BF6A2 or 79A1C19077AF9EFC) looks like a session id and possibly comes from somewhere in the HTML returned by the initial GET request. With the browser developer tool open, do a manual browse and file download and examine the POST request and search the HTML source to see if this hex number exists somewhere.

Another thread which constructs a form data string and sends it in a POST request to download a file and which you might find useful is http://www.mrexcel.com/forum/excel-questions/749860-msxmlhttp-post-paramaters.html.
 
Upvote 0
Woops, should be n_clmgrp_id. And the number of n_clmgrp_id seems to vary based on how many checkboxes are checked. So if I check 2 checkboxes then we get two n_clmgrp_id params.

Here's what I get when I run fiddler at the point of hitting "submit" on the website:
GET http://url.aspx HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: url.com


Here's my old code that uses UIAutomation. Supplying it to give more background. I don't like it because it's too dependent on Sleep. You can see how I handle the checkboxes and so on:
Code:
Public Sub Import_CAT_Extract()
'create a query for CAT Review Extract
'filter by date
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim User_Start_Date As Date
Dim CAT_start_date As Date
Dim Source_file As String
Dim i As Double
Dim h As Long

Clear_Downloads

h = IE.Hwnd
User_Start_Date = ThisWorkbook.Worksheets("Control").Cells(12, 6)
IE.Visible = True
IE.navigate "[URL="http://url.aspx/"]http://url.aspx[/URL]"

Do
    DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document

'get all the CATs needed for download filtered by date
Set CAT_labels = Doc.getElementsByTagName("label")
Set CheckBoxes = Doc.getElementsByName("n_clmgrp_id")
i = 0

For Each CAT_label In CAT_labels
    'parse the data in the CAT_label for a date
    CAT_start_date = CDate(Trim(Mid(CAT_label.innerText, InStr(CAT_label.innerText, "(") + 1, 10)))
    If CAT_start_date > User_Start_Date Then
    'check the box
    CheckBoxes.Item(i).Checked = True
    'Doc.getElementById("n_clmgrp_id").Click
    End If
    i = i + 1
Next CAT_label

'click the 'submit' button
Doc.getElementById("submit").Click
Do
    DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

'Sleep 600000
Sleep 8000

'save csv file
Dim o As IUIAutomation
Dim e As IUIAutomationElement
Set o = New CUIAutomation
h = FindWindowEx(h, 0, "Frame Notification Bar", vbNullString)
If h = 0 Then Exit Sub
Set e = o.ElementFromHandle(ByVal h)
Dim iCnd As IUIAutomationCondition
Set iCnd = o.CreatePropertyCondition(UIA_NamePropertyId, "Save")
Dim Button As IUIAutomationElement
Set Button = e.FindFirst(TreeScope_Subtree, iCnd)
Dim InvokePattern As IUIAutomationInvokePattern
Set InvokePattern = Button.GetCurrentPattern(UIA_InvokePatternId)
InvokePattern.Invoke
Sleep 5000

'close browser
'IE.Visible = True
IE.Quit
Set Doc = Nothing
Set IE = Nothing

'Copy the saved file to the appropriate folder as an .xlsx file
Source_file = Dir(temp_fpath & "*.csv")
DestinationFile_CAT_Extract = "\\fakefilepath.xlsx"
Set Source = Workbooks.Open(temp_fpath & Source_file, False, True)
ActiveWorkbook.SaveAs DestinationFile_CAT_Extract, 51
ActiveWorkbook.Close False
  
End Sub
 
Upvote 0
Here's what I get when I run fiddler at the point of hitting "submit" on the website:
GET http://url.aspx HTTP/1.1]
Is there a POST request?

It will be very difficult to help you without the URL. Basically, the VBA code must emulate exactly what a browser does when it requests the initial web page and requests the file download. The fact that your OP code saves the HTML source instead of the csv file means you have specified the wrong parameter name(s) or value(s) somewhere, or not sent a necessary request.

In Fiddler use the Inspectors tab and the Raw sub-tab to examine the request headers. For a POST request, click the WebForms sub-tab to examine the form parameters, or on the Raw sub-tab click View in Notepad and the form data parameters are highlighted in yellow and I usually edit the Notepad file to put each parameter on a separate line (deleting the long __VIEWSTATE etc. values to just the first 10 characters) to make them easier to check.
 
Upvote 0
There doesn't seem to be a POST request. Or at least I can't find it.

Here's the end of the URL:

.aspx?processind=y&extractind=y&excellinkformat=y&getreserveind=A&getfininfo=y&n_clmgrp_id=79A1C19077AF9EFC&sourcesystem=all&submit=get+claims

Here's the form data parameters and values for a query where one check box has been checked (i.e. 1 n_clmgrp_id value):Name Value
processind y
extractind y
excellinkformat y
getreserveind A
getfininfo y
n_clmgrp_id 79A1C19077AF9EFC
sourcesystem all
submit get claims

All the parameters will take the same value for every query except for n_clmgrp_id, which is dependent on user's choice of checkboxes. If there are multiple checkboxes, we get n_clmgrp_id="string1" & n_clmgrp_id="string2"

I can see the response body that I want using fiddler, so like you said it's just my code that's not quite right.

 
Upvote 0
If there isn't a POST request, have you tried a GET request and saving the response in the local file? Like this:
Code:
#If VBA7 Then
    Private Declare PtrSafe Function DeleteUrlCacheEntry Lib "Wininet.dll" Alias "DeleteUrlCacheEntryA" (ByVal lpszUrlName As String) As Long
#Else
    Private Declare Function DeleteUrlCacheEntry Lib "Wininet.dll" Alias "DeleteUrlCacheEntryA" (ByVal lpszUrlName As String) As Long
#End If


Public Sub XMLhttp_Download_File()
    
    #If VBA7 Then
        Dim httpReq As XMLHTTP60
        Set httpReq = New XMLHTTP60
    #Else
        Dim httpReq As XMLHTTP
        Set httpReq = New XMLHTTP
    #End If
    Dim URL As String
    Dim ADODBstream As Object
    Dim localFile As String
    
    'The folder and file name which the downloaded file will be saved as
    
    localFile = "C:\path\to\file.csv"
    
    URL = "http://yoursite/page.aspx?processind=y&extractind=y&excellinkformat=y&getreserveind=A&getfininfo=y&n_clmgrp_id=79A1C19077AF9EFC&sourcesystem=all&submit=get+claims"
    
    DeleteUrlCacheEntry URL
    
    With httpReq
        .Open "GET", URL, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322; .NET4.0C; .NET4.0E)"
        .setRequestHeader "??", "??"  'maybe other headers are needed
        .send
        
        'Save response in local file
        
        If .Status = 200 Then
            Set ADODBstream = CreateObject("ADODB.Stream")
            With ADODBstream
                .Type = 1 'adTypeBinary
                .Open
                .write httpReq.responseBody
                .SaveToFile localFile, 2
                .Close
            End With
        End If
        
    End With
    
End Sub
As noted in the code, you may need to set other request headers.

If the above doesn't download the file then using Fiddler, compare very carefully the browser's request(s) and response(s) with those generated by the VBA code and look for differences.
 
Upvote 0
Ok we're getting closer! This gets me the output I want! Now I just need to add some code that will be able to add in extra n_clmgrp_id params depending on my query (the number of checkboxes I would select if I were going to the website). I could probably use some of my old code here like:

Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim User_Start_Date As Date
Dim CAT_start_date As Date
Dim i As Double

User_Start_Date = ThisWorkbook.Worksheets("Control").Cells(12, 6)

IE.Visible = False
IE.navigate "http://url.aspx"

Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Set Doc = IE.document

'get all the CATs needed for download filtered by date
Set CAT_labels = Doc.getElementsByTagName("label")
Set CheckBoxes = Doc.getElementsByName("n_clmgrp_id")
i = 0

For Each CAT_label In CAT_labels
'parse the data in the CAT_label for a date
CAT_start_date = CDate(Trim(Mid(CAT_label.innerText, InStr(CAT_label.innerText, "(") + 1, 10)))
If CAT_start_date > User_Start_Date Then
'get the check box value and concatenate to string with the others
n_clmgrp_id_value = n_clmgrp_id_value & "&n_clmgrp_id="& CheckBoxes.Item(i).Value
End If
i = i + 1
Next CAT_label

'then I can use this string in my URL in my GET request

I'll see if this works!
 
Upvote 0

Forum statistics

Threads
1,224,820
Messages
6,181,159
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top