Save HTML source code from VBA?

jflurrie

New Member
Joined
Sep 22, 2008
Messages
43
Hi,

I would need to get HTML source code from any given page. I know how to open a HTML page from Excel and I can do it with VBA, but how to get for example this page's source code?

I would start with making a sub that takes a string (the address) as an input parameter and finish with saving the source code of that address as an text file like c:\code.txt

So something like

Dim webaddress as string

Sub GetSourceCode(webaddress)
'then some code to save the source code
End Sub
 
Re: How to save HTML source code from VBA?

Eh, sorry but there are no parameters like that.:)
 
Upvote 0

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Re: How to save HTML source code from VBA?

Ok, now I have a working code. No, it's not what is "good programming habits", in fact, this is probably the worst bubblegum code I've ever seen, but this works.

So how to make this without sendkeys? And yes, the solution can use Internet Explorer as well, doesn't matter.

Code:
Sub getsource()
   
    FirefoxID = Shell("C:\Program Files\Mozilla Firefox\firefox.exe", 1)
    Application.Wait (Now + TimeValue("0:00:03"))
    AppActivate FirefoxID
    Application.Wait (Now + TimeValue("0:00:05"))
    Application.SendKeys "^T", True
    Application.Wait (Now + TimeValue("0:00:05"))
    Application.SendKeys "{F6}", True
    Application.Wait (Now + TimeValue("0:00:03"))
    Application.SendKeys "www.google.com"
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "~"
    Application.Wait (Now + TimeValue("0:00:03"))
    Application.SendKeys "%V", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "O", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "%E", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "A", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "%E", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "c", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "^C", True
    NotepadID = Shell("C:\WINDOWS\NOTEPAD.EXE", 1)
    Application.Wait (Now + TimeValue("0:00:03"))
    AppActivate NotepadID
    Application.SendKeys "^V", True
    Application.SendKeys "%FAsource%S", True
    Application.Wait (Now + TimeValue("0:00:01"))
    Application.SendKeys "%{F4}", True
End Sub
 
Upvote 0
Re: How to save HTML source code from VBA?

This will send your page to a MsgBox as Text or Html. You could also send it to a TextBox with a little extra coding, as the MsgBox only displays part of a page!


Sub GetTextOrHtmlFromIe()
'Standard module code, like: Module1.
Dim bln As Boolean
Dim ie As Object, objDoc As Object

Const strMsg As String = "To get a text version of your page, click [Yes]," & vbLf & _
"To get the Html version, click [No]"

Const strURI As String = "http://www.mrexcel.com"

Set ie = CreateObject("internetexplorer.application")

ie.Navigate strURI

'Wait for page to load!
Do
If ie.ReadyState = 4 Then
ie.Visible = False
Exit Do
Else

DoEvents
End If
Loop

Set objDoc = ie.Document

If MsgBox(strMsg, vbInformation + vbYesNo, "Display page!") = vbYes Then bln = True

If bln Then
'Text
MsgBox objDoc.body.innerText, vbInformation + vbOKOnly, "Text Page!"
Else

'Html
MsgBox objDoc.body.innerHTML, vbInformation + vbOKOnly, "Html Page!"
End If

Set objDoc = Nothing
Set ie = Nothing
End Sub
 
Last edited:
Upvote 0
Re: How to save HTML source code from VBA?

Now we are getting really close. Your code is otherwise excellent, but the MsgBox, like you wrote, doesn't seem to be able to handle that much HTML code. Neither does a single Excel cell. Nor Word's document. All three cuts the string (a huge string!) from the middle, so my guess is that there's some problem with the reading/writing of that string. So now I have a variable (variable = objDoc.body.innerHTML) full of data, but how to put it all out to a text file?

But thank you for helping me, this helps me a lot!
 
Last edited:
Upvote 0
Re: How to save HTML source code from VBA?

Hi,

To save into TXT file add this part to Joe code:
Rich (BB code):
<font face=Courier New>
    FileName = "C:\Html_Dump.txt"
    FileNo = FreeFile
    Open FileName For Binary Access Write As #FileNo
    Put #FileNo, , objDoc.body.innerHTML
    Close #FileNo</FONT>
Take into account that objDoc.body.innerHTML has a binary header.

Vladimir
 
Upvote 0
Re: How to save HTML source code from VBA?

Sub GetTextOrHtmlFromIe()
'Standard module code, like: Module1.
Dim bln As Boolean
Dim NPOFile$, myFile$
Dim ie As Object, objDoc As Object, f As Object, fs As Object

Const strMsg As String = "To get a text version of your page, click [Yes]," & vbLf & _
"To get the Html version, click [No]"

Const strURI As String = "http://www.mrexcel.com"

Set ie = CreateObject("internetexplorer.application")

ie.Navigate strURI

'Wait for page to load!
Do
If ie.ReadyState = 4 Then
ie.Visible = False
Exit Do
Else

DoEvents
End If
Loop

Set objDoc = ie.Document

If MsgBox(strMsg, vbInformation + vbYesNo, "Display page!") = vbYes Then bln = True

If bln Then
'Text
strMyPage = objDoc.body.innerText
Else

'Html
strMyPage = objDoc.body.innerHTML
End If

'Use this Text file!
fileToOpen = Application _
.GetOpenFilename("Text Files (*.txt), *.txt")
If fileToOpen <> False Then
End If

myFile = fileToOpen

NPOFile = "NotePad.exe " & myFile

'Work with file [Append Text to File].
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.OpenTextFile(myFile, 8, TristateUseDefault)

f.Write Chr(9) & strMyPage
f.Close

'Open NotePad with a data file!
ActiveSheet.Select

Call Shell(NPOFile, 1)

Set objDoc = Nothing
Set ie = Nothing
End Sub
 
Last edited:
Upvote 0
Re: How to save HTML source code from VBA?

Note: Set f = fs.OpenTextFile(myFile, 8, TristateUseDefault)
The "8" means to append to the file for a history.
To only write new data and erase the old data in the file each time change the "8" to "2"
Or just name a new file each time.
 
Upvote 0
Re: How to save HTML source code from VBA?

Hello All, I am new to this forum and a self taught VBA programmer for the purpose of hockey pools.
I have stumbled across Joe Was's code and it works great for me, the one thing i am not able to do is change the Web site as i run ( i need the code to loop to read a number of websites)
the way i have the code is this
Const strURI As String = "http://hockey.fantasysports.yahoo.c...P&cut_type=33&stat1=S_S_2008&sort=AR&count=25"


the link won't work for you, you need to log in, but what i am askign is how i can change the count=25 at the end to my next webpage of count=50, then count=75...

i have the code looping but can't find a way to change that to keep it running.
I always used to just change the 25 with " & NumVariable & " but in this case that doesn't work for me


Thanks in advance
Chinnick
 
Upvote 0
Re: How to save HTML source code from VBA?

Hello everybody, I used your code to retrieve data from DHL site and below code works good.
Since I should use the code for a high number of DHL numbers I'm wondering how to delete MsgBox. If i delete the line the macro returns me Runtime error '5'.
Code:
Option Explicit
Sub testit()






' open web page
Dim ie As Object, doc As Object
Set ie = CreateObject("internetexplorer.application")
With ie
    ' .AddressBar = False
    ' .StatusBar = False
    ' .MenuBar = False
    ' .Toolbar = 0
    ' .Visible = True
    .Navigate "http://www.dhl.it/it/express/ricerca.html?AWB=" & Cells(10, 10) & "&brand=DHL"
End With
Do: DoEvents: Loop Until ie.busy
Do: DoEvents: Loop Until ie.readystate = 4


ie.Visible = False


Set doc = ie.Document
MsgBox "doc.body.innerhtml = " & Len(doc.body.innerhtml)






'==================ECCOLO IL MIO TESTO================
Dim Testo1 As String
Dim Testo2 As String
Dim Testo3 As String
Dim Testo4 As String




Testo1 = Mid(doc.body.innerhtml, InStr(1, doc.body.innerhtml, "<TD>5</TD>"), (InStr(1, doc.body.innerhtml, "<TD>4</TD>") - InStr(1, doc.body.innerhtml, "<TD>5</TD>")))
Testo2 = Mid(doc.body.innerhtml, InStr(1, doc.body.innerhtml, "><STRONG>Waybill:"), 438)
Testo3 = Mid(Testo2, InStrRev(Testo2, "<SPAN>") + 6, 1500)
Testo4 = Mid(Testo3, 1, InStr(Testo3, "<") - 1)
'_____TESTO1__________
Cells(11, 10) = Mid(Mid(Testo1, InStr(1, Testo1, "<TD>5</TD>") + 16, 100), 1, InStr(1, Mid(Testo1, InStr(1, Testo1, "<TD>5</TD>") + 16, 100), "</TD>") - 1)


'_____TESTO2__________
Cells(12, 10) = Mid(Mid(Testo2, InStr(1, Testo2, "<SPAN>") + 6, 100), 1, InStr(1, Mid(Testo2, InStr(1, Testo2, "<SPAN>"), 100), "</SPAN>") - 7)
Dim var As String
var = Mid(Mid(Testo2, InStr(1, Testo2, "<SPAN>") + 6, 100), 1, InStr(1, Mid(Testo2, InStr(1, Testo2, "<SPAN>"), 100), "</SPAN>") - 7)


'_____TESTO3__________


Cells(13, 10) = IIf(var = Testo4 And Testo4 <> "", "POD non presente", Testo4)






End Sub


Thanks in advance for support.

Federico
 
Upvote 0

Forum statistics

Threads
1,224,552
Messages
6,179,486
Members
452,917
Latest member
MrsMSalt

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top