I have the need to scrape some sites. Most of the time, I can use MSXMLHTTP, but there are times I neet to access the loaded page to get some of the data.
There are 2 ways that I would like to control this. The first is before launching the scrape via userform, if I know that I will need to load the page.
The other case is if I get an error when trying to reach a piece of info
It is my understanding that one I get the document in to the dov variabel that the processing of the data should be the same.
So, here is what I see as some psuado code.
The issues I am running in to is the oHttp is trying to be declared 2 tiimes, and the IE when I try to get element like this fails:
s_header = HTMLdoc.getElementsByTagName("h1")(0).innerText
But this works in the MSXMLHTTP
Any help/thoughts?
Thanks
Bruce
There are 2 ways that I would like to control this. The first is before launching the scrape via userform, if I know that I will need to load the page.
The other case is if I get an error when trying to reach a piece of info
It is my understanding that one I get the document in to the dov variabel that the processing of the data should be the same.
So, here is what I see as some psuado code.
VBA Code:
public Need_IeError as boolean
public need_ieForm as boolean
both Need_Ie would be set to false
'form would have a check box where if a certain piece of info is requested, it would mark Need_Ie as true
'then the constol modual with a loop to control the scrape would launch creating the url in a for next loop, then call the 'scrape module
sub Control
'luanch form get options the start loop
for i = 1 to
url = whatever from ws
if scrape(url) then
progress update to form
else
failed! Need to set Need_Ie to TRUE the return the scrape
Need_IeError= true
scrape(url)
'turn off the error
Need_IeError= false
Next
end sub
Function Srape(url) as boolean
'if either Need_ie are true lauch IE mode,
'here I am inserting the actaul code I am trying. The MSXMLHTTP portion already works with all fields, so that I what I need the IE to mimic.
If ieMode Then
Dim oHttp As New InternetExplorer
oHttp.Visible = True
oHttp.navigate url
Do
DoEvents
Loop Until oHttp.readyState = READYSTATE_COMPLETE
'Set oHttp = oHttp.document
HTMLdoc.body.innerHTML = oHttp.document
Else
On Error Resume Next
Set oHttp = New MSXML2.XMLHTTP60
If Err.Number <> 0 Then
Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
MsgBox "Error 0 has occured while creating a MSXML.XMLHTTPRequest object"
End If
On Error GoTo 0
If oHttp Is Nothing Then
MsgBox "For some reason I wasn't able to make a MSXML2.XMLHTTP object"
Exit Sub
End If
'Open the URL in browser object
oHttp.Open "GET", url, False
oHttp.send
HTMLdoc.body.innerHTML = oHttp.responseText
End If
'Then from here I process the page calling functions that return strings like this:
s_Get_Instructions = Get_Instructions(HTMLdoc.body)
or
sGet_VolPrice = Get_VolPrice(HTMLdoc)
'I would need to have the function return a string "ERROR" and the if this fails, set Need_IeError to TRUE, then exit the 'SCRAPE with a FALSE with a if block like this:
if sGet_VolPrice = "ERROR" then
Need_IeError = TRUE
Scrape = FALSE
Exit function
end if
'process all of the data as needed....
end function
The issues I am running in to is the oHttp is trying to be declared 2 tiimes, and the IE when I try to get element like this fails:
s_header = HTMLdoc.getElementsByTagName("h1")(0).innerText
But this works in the MSXMLHTTP
Any help/thoughts?
Thanks
Bruce