I'm trying to extract data from some webpages. I'm pulling in the html with an XMLHTTP object and extracting the desired information with Regex.
The following code works fine for vanilla ".html" pages:
In that case, I get the full html code into htmlStr, whence I can extract the desired info.
BUT data:image/s3,"s3://crabby-images/e04d5/e04d515da8ba5548ac4f46f44015a9cd80dd5f4a" alt="Mad :mad: :mad:"
When I try it on other sorts of pages, such as .asp and .php, all I get in oXMLHTTP.responseText is a "html" tag.
I'm trying to parse simple pages - ones that take no variables or parameters. E.g.: http://www.quoteland.com/random.asp . I just want to get back the html code that shows up as the page source in my browser when I surf to such sites.
After several hours of searching the web, all I've been able to come up with is that maybe I need to do something with setRequestHeader. Or maybe I need some sort of "POST" rather than "GET". Or something.
Nowhere have I been able to find a simple explanation of the XMLHTTP functionality, nor instructions on how I need to configure the object in order to get the HTML code generated by .asp, .php, and similarly flavored URLs.
Um.... HELP?!!!data:image/s3,"s3://crabby-images/0105d/0105d4d364e81077443e2ccf09dd58bb3b6a1efa" alt="Confused :confused: :confused:"
Thanks,
LP
The following code works fine for vanilla ".html" pages:
Code:
Dim pageHTML As String
Dim oXMLHTTP As Object
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
:
:
oXMLHTTP.Open "GET", urlString, False
oXMLHTTP.send ""
If oXMLHTTP.Status <> 200 Then
errorStr = oXMLHTTP.statusText
GoTo HandleError
Else
htmlStr= oXMLHTTP.responseText
End If
data:image/s3,"s3://crabby-images/e04d5/e04d515da8ba5548ac4f46f44015a9cd80dd5f4a" alt="Mad :mad: :mad:"
data:image/s3,"s3://crabby-images/e04d5/e04d515da8ba5548ac4f46f44015a9cd80dd5f4a" alt="Mad :mad: :mad:"
When I try it on other sorts of pages, such as .asp and .php, all I get in oXMLHTTP.responseText is a "html" tag.
I'm trying to parse simple pages - ones that take no variables or parameters. E.g.: http://www.quoteland.com/random.asp . I just want to get back the html code that shows up as the page source in my browser when I surf to such sites.
After several hours of searching the web, all I've been able to come up with is that maybe I need to do something with setRequestHeader. Or maybe I need some sort of "POST" rather than "GET". Or something.
Nowhere have I been able to find a simple explanation of the XMLHTTP functionality, nor instructions on how I need to configure the object in order to get the HTML code generated by .asp, .php, and similarly flavored URLs.
Um.... HELP?!!!
data:image/s3,"s3://crabby-images/0105d/0105d4d364e81077443e2ccf09dd58bb3b6a1efa" alt="Confused :confused: :confused:"
Thanks,
LP