Read HTML Source Code with VBA

KevinJ

New Member
Joined
Jun 13, 2011
Messages
9
Using VBA, I am trying to retrieve the contents of the Source of a web page (the same as would appear if you right-clicked on the page and chose "View Source") into a variable so I can work on it in VBA (using InStr, etc.).

The problem is I can use code such as
strHTMLText = ie.Document.body.innerText
or
strHTMLText = ie.Document.body.outerText
to retrieve the code, but in either case only part, not all, of the source code is captured. I need ALL the code. Is there some kind of code such as ie.Document.body.allText or similar that would perform this function?

Much obliged!
 

Excel Facts

Will the fill handle fill 1, 2, 3?
Yes! Type 1 in a cell. Hold down Ctrl while you drag the fill handle.
What code aren't you getting with that?

It should return all the HTML of the element you specify.

How are you getting it anyway?

You mention a string variable, in VBA the String data type does have a limit of characters it can hold.
 
Upvote 0
Norie,

Thanks for the quick response.

When I use ie.Document.body.innerText, all I apparently get is the text within the Body element. What I'm trying to get to is this portion of HTML within the Body element of the Source Page (none of it shows up with innerHTML):

<A class=notes href=FootNotes.asp?FNtsID=7947 ***********="return overlib('<B class=notes>8</B><SUP class=notes>7</SUP> Peter here warned the believers who were suffering...', CAPTION, '1 Peter 5', FGCOLOR, '#fffff0', BGCOLOR, '#8b0000');" **********="return nd();">7</A>

Can you give any light how I can access it?

Thanks again.
 
Upvote 0
I don't suppose you can post the URL?

Without that it's hard to tell what might be happening.
 
Upvote 0
Is that all you want, not the full code?
 
Upvote 0
There are often several snippets of code within a page in the form "href=FootNotes.asp?FNtsID=2353" (with only the last numeric :crash:digits different, though they can be in length of one digit to four digits as here). I.e., FNtsID can be from 1 to 9999.

If I were able to fetch the first FootNotes reference on a page, that would be sufficient. If it is involves getting HTML text on either side of the the reference, I can work with that using InStr.

Thanks!
 
Upvote 0
You can get the full html using document.outerHTML.

I'm afraid I can't help much further though - this page seems copyrighted in some way.

Not sure exactly how, but that doesn't quite matter.

I'll give you a couple of hints though - everything you seem to be looking for is in the links on the page.

You can indentify the ones you want by looking at the value of their class attribute, it's 'notes' for the numbers and 'ref' for the letters.
 
Upvote 0
Try this code and see result in Immediate window:
Rich (BB code):

Sub Test()
  Const URL$ = "http://online.recoveryversion.org/bibleverses.asp?fvid=2901&lvid=2901"
  Const MASK$ = "href=FootNotes.asp?FNtsID="
  Dim txt As String, i As Long
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", URL, False
    .Send
    txt = .ResponseText
  End With
  Do
    i = InStr(i + 1, txt, MASK)
    If i = 0 Then Exit Do
    Debug.Print Val(Mid$(txt, i + Len(MASK), 15))
  Loop
End Sub
 
Upvote 0
Norie,

When I tried the document.outerHTML I got a Run-time error 438: Object doesn't support this property or method. I think your hints about looking up the value of the class attribute, etc. would have worked if I had gone on to try them out.

ZVI (was that Vladimir?),

Your suggestion worked fantastic! I'll see how I can incorporate it in my code.

Both - Thanks for taking time and effort to reply. Best to you!
 
Upvote 0

Forum statistics

Threads
1,224,828
Messages
6,181,201
Members
453,022
Latest member
RobertV1609

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top