Importing Web Page HTML elements text using VBA

mattu1990

Board Regular
Joined
Mar 9, 2013
Messages
52
Hi,

I'm new to VBA (and to this forum) and was wondering if someone could help me with a problem getting some web page data into Excel 2010 using VBA.

Basically the scenario I am trying to create is very similar to this YouTube tutorial. Excel VBA Pull Data From A Website - YouTube

My scenario however is set up with the following titles in cell A1, B1, C1, D1 and E1 : POST CODE, OUTLET, ADDRESS, TELEPHONE, EMAIL

The result I want to achieve is I enter a post code into cell A2 for example, Excel then uses IE to navigate to the relevant web page as defined in the VBA code. I then want the following to happen:


  • The InnerText of the web page's h1 tag is then inserted into the OUTLET cell (B2)
  • The first instance of the p tag is then inserted into the ADDRESS cell (C2)
  • The second instance of the p tag is then inserted into the TELEPHONE cell (D2)
  • The third instance of the p tag is then inserted into the EMAIL cell (E2)

All instances of the p tag are contained in a div element called div class="adBox_content" . There are also 5 other DIVs above that DIV in the hierarchy (whether that helps or not I don't know?)

Using the YouTube tutorial link, the method has worked for me using the getElementsByTagName("h1").innerText However, when I try adding a second getElementsByTagName("p")(01).innerText the whole thing fails.

So I'm left with two problems; I can't make the VBA get more than one element at a time from the page, I can only either have the h1 or the first instance of the p tag. I've tried all the getElementBy methods and none of them seem to work in getting the second and third
instances to show.


I also need the code to make the data be put on the same row ONLY as where the post code was entered. In this scenario for example of entering a post code into A2, the OUTLET needs to land in cell B2 only, ADDRESS C3 only etc.

By following the youtube tutorial above by giving the cells names to refer to in the code, the data ends up being inputted in all further rows with identical cell names. I need it to not do that.

Any help would be hugely appreciated. The code is needed for around 300 rows of post codes that will be entered and refreshed every week or so.

Thanks,
Matt
 
Last edited:
Apologies again. I just got it Mike.

I figured out how to attach the script to a button and it worked straight away. No errors in the code at all.

Can't see any problems on my tests so far. :)

You're a star!!
 
Upvote 0

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
Hi again Mike.

I just have a quick query regarding the code.

I've just been running it yesterday on all our codes and one problem that occurs is after around 60 codes have been run continuously, the macro locks up with the following error.

Run-time error '-2147319783 (80028019)':

Automation error
Old format or invalid type library.



A quick look on Microsoft website says it's something to do with Multilingual pack, but I doubt that to be the case. Any ideas?


There is also another error which shows up some times but not as often as the one above. The second error is:

Run-time error: '-2147023170 (800706be)':

Automation error
The remote procedure failed.



The problem is that when any of these errors occur, I have to close the file and start over again. If I simply run the script again, it starts from the top, which means it goes out to get a lot of data that I've already pulled in.

I'm wondering do you think it's because it's trying to get the data too fast? Maybe a line of code could be added at the end of the for loop to make te application wait a few seconds before it moves onto the next cell.

What do you think/Or know any code to make it wait?

Every now and then IE also loads up some of the URLs in the browser. I'm not that bothered about this though.
 
Upvote 0
Hi Matt

Could you please post the section of code that accesses the "Find us" web page.

Many thanks.
 
Upvote 0
Hi Mike,

<code>
Sub GTWebAccess(ByVal Tgtrw As Long)


Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim NodeList
Dim Elem
Dim X
IE.Visible = False
IE.navigate "http://www.greenthumb.co.uk/find-us-page?postcode=" & Range("A" & Tgtrw).Value

Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Set Doc = IE.document

Application.EnableEvents = False
' Retrieve the Outlet from the h4 tag
Range("B" & Tgtrw).Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)
X = 1
' Retrieve the Paragraph tags
Set NodeList = Doc.getElementsByTagName("p")
For Each Elem In NodeList
'
' The Paragraph info is located in paragraphs 1 through 6
' 1 and 2 contain the Address
' 3 and 4 contain the Telephone
' 5 and 6 contain the E-mail
Select Case X
Case Is = 2
If Elem.innerText = "Just fill in your details" Then
Range("B" & Tgtrw).Value = "NO COVERAGE"
Application.EnableEvents = True
IE.Quit
Set IE = Nothing
Exit Sub
Else
Range("C" & Tgtrw).Value = Elem.innerText
End If
Case Is = 4
Range("D" & Tgtrw).Value = Elem.innerText
Case Is = 6
Range("E" & Tgtrw).Value = Elem.innerText
End Select
X = X + 1
Next Elem
Application.EnableEvents = True
IE.Quit
Set IE = Nothing
' Set columns B through E column width to 32
Columns("B:E").ColumnWidth = 32
End Sub
</code>

I have tried adding the following code to Button_Click macro before the line that says Next Tgtrw

<code>
Application.Wait Now + TimeValue("00:00:05")
</code>

That seems to slow it down a bit but it still gets the same error now and then. Also seems to make the pages open in the browser more often as well.
 
Upvote 0
Hi Matt

Replace the code :-
Code:
IE.Visible = False
IE.navigate "http://www.greenthumb.co.uk/find-us-page?postcode=" & Range("A" & Tgtrw).Value

Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Set Doc = IE.document

with
Code:
Set Doc = CreateObject("htmlFile")
    With CreateObject("MSXML2.XMLHTTP")
         .Open "GET", "" & "http://www.greenthumb.co.uk/find-us-page?postcode=" & Range("A" & Tgtrw).Value

    Doc.body.innerHTML = .responseText
    End With

and also remove the
IE.Quit
Set IE = Nothing
statements.

I have found that accessing web pages this way is somewhat quicker than through IE.

hth
 
Upvote 0
Thanks Mike, I'm now getting the following error:

Run-time error '-2147483638 (8000000a)':

The data necessary to complete this operation is not yet available.

Bit of a weird one..
 
Upvote 0
Hi

Sorry my bad, this will cure it :-
Rich (BB code):
        .Open "GET", "" & "http://www.greenthumb.co.uk/find-us-page?postcode=" & Range("A" & Tgtrw).Value, False
 
Upvote 0
Thanks Mike, another error now though.

Run-time error '-2147467259 (80004005)':

Unspecified erro
r

The debugger highlights the following line of code:

<code>
Doc.body.innerHTML = .responseText
</code>
 
Upvote 0
Hi

Sorry I must have deleted this line :-
Code:
       .send
in error. It should precede the line quoted in post #29.
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,917
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top