Webpage data scraping help needed...

CrashDDL

Board Regular
Joined
Oct 17, 2016
Messages
66
Hi,

Could someone tell me why the 1st code works but the 2nd doesn't? Both should print out "2" as the result yet only the first one does.

Thanks for looking

Code:
Sub get_commonShips()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument


    IE.Visible = True
    IE.navigate "https://robertsspaceindustries.com/pledge/ship-upgrades"
    
    Do While IE.readyState <> READYSTATE_COMPLETE
    Loop
    
    Set HTMLDoc = IE.document
    '===============================================================================
    Dim Buttons As MSHTML.IHTMLElementCollection                                    ' find & click "Choose a ship"
    '---------------------------------------------------
    Set Buttons = HTMLDoc.getElementsByClassName("choose-ship js-choose-ship")
    Debug.Print Buttons.Length
    '===============================================================================
    
End Sub

Code:
Sub get_commonShips_XML()


    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLPage.Open "GET", "https://robertsspaceindustries.com/pledge/ship-upgrades", False   ' {False: replaces wait loop}
    XMLPage.send
    
    HTMLDoc.body.innerHTML = XMLPage.responseText
    '===============================================================================
    Dim Buttons As MSHTML.IHTMLElementCollection                                    ' find & click "Choose a ship"
    '---------------------------------------------------
    Set Buttons = HTMLDoc.getElementsByClassName("choose-ship js-choose-ship")
    Debug.Print Buttons.Length
    '===============================================================================
    
End Sub
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Because the initial page (ship-upgrades.html) includes JavaScript code which a browser (your IE.navigate) automatically loads and runs as required to request other parts of the page. Your XMLHttp request only loads ship-upgrades.html.

You can see the extra requests by looking at the Network tab in IE's Developer Tools (press the F12 key).
 
Upvote 0
Because the initial page (ship-upgrades.html) includes JavaScript code which a browser (your IE.navigate) automatically loads and runs as required to request other parts of the page. Your XMLHttp request only loads ship-upgrades.html.

You can see the extra requests by looking at the Network tab in IE's Developer Tools (press the F12 key).

Is there a way to make it work through XML or first option is the only one?
 
Upvote 0
In theory it is possible to send multiple XML requests to emulate the browser, but this requires a lot of investigation and isn't really worth the effort. Although slower, it is far easier to use the first method and automate IE.
 
Upvote 0
In theory it is possible to send multiple XML requests to emulate the browser, but this requires a lot of investigation and isn't really worth the effort. Although slower, it is far easier to use the first method and automate IE.

That's what I thought. Thanks for the reply :)
 
Upvote 0

Forum statistics

Threads
1,224,919
Messages
6,181,749
Members
453,064
Latest member
robatthe2A

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top