Importing Web Page HTML elements text using VBA

mattu1990

Board Regular
Joined
Mar 9, 2013
Messages
52
Hi,

I'm new to VBA (and to this forum) and was wondering if someone could help me with a problem getting some web page data into Excel 2010 using VBA.

Basically the scenario I am trying to create is very similar to this YouTube tutorial. Excel VBA Pull Data From A Website - YouTube

My scenario however is set up with the following titles in cell A1, B1, C1, D1 and E1 : POST CODE, OUTLET, ADDRESS, TELEPHONE, EMAIL

The result I want to achieve is I enter a post code into cell A2 for example, Excel then uses IE to navigate to the relevant web page as defined in the VBA code. I then want the following to happen:


  • The InnerText of the web page's h1 tag is then inserted into the OUTLET cell (B2)
  • The first instance of the p tag is then inserted into the ADDRESS cell (C2)
  • The second instance of the p tag is then inserted into the TELEPHONE cell (D2)
  • The third instance of the p tag is then inserted into the EMAIL cell (E2)

All instances of the p tag are contained in a div element called div class="adBox_content" . There are also 5 other DIVs above that DIV in the hierarchy (whether that helps or not I don't know?)

Using the YouTube tutorial link, the method has worked for me using the getElementsByTagName("h1").innerText However, when I try adding a second getElementsByTagName("p")(01).innerText the whole thing fails.

So I'm left with two problems; I can't make the VBA get more than one element at a time from the page, I can only either have the h1 or the first instance of the p tag. I've tried all the getElementBy methods and none of them seem to work in getting the second and third
instances to show.


I also need the code to make the data be put on the same row ONLY as where the post code was entered. In this scenario for example of entering a post code into A2, the OUTLET needs to land in cell B2 only, ADDRESS C3 only etc.

By following the youtube tutorial above by giving the cells names to refer to in the code, the data ends up being inputted in all further rows with identical cell names. I need it to not do that.

Any help would be hugely appreciated. The code is needed for around 300 rows of post codes that will be entered and refreshed every week or so.

Thanks,
Matt
 
Last edited:

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
Hello Matt,

I wrote some code to pull text from HTML tables. If you post the URL, I can look at the site's webpage format and adapt the code to your needs.
 
Upvote 0
Hi Leith,

Here's an example of one of our pages that we want to get the information from.

So in that example, we would need the h1 text "Bucks" putting into cell A2. Then the address p tag in cell B2. Then telephone p tag in A3 and Email p tag in A4.

The reason we want to do this is it will keep our companies' online post codes in sync with our offline copies, making changes easier as well.

Thanks,
Matt
 
Last edited:
Upvote 0
Hi

Perhaps you can adapt this code :-
Code:
Set NodeList = HTMLdocMain.getElementsByTagName("p")
For Each elem In NodeList
    MsgBox elem.innerText
Next elem

hth
 
Upvote 0
Sorry I'm still a bit confused. Could any of your write a script that would get the innerText of the information we require from scratch and dump it into the relevant cells? I have no VBA experience.
 
Upvote 0
Hi Matt

Can you show the code you have created for the tutorial example?

Please enclose between the code tags which is CODE and /CODE surrounded by square-brackets.

Thanks.
 
Last edited:
Upvote 0
Here's the code. It takes the the value entered in the "zip code" field and adds it to the URL. In this case Cell A2. It then places the result in the cell below, based on its name in this case. The code gets the H1 fine and the first P tag instance, but I just can't get the second, third or fourth and so on.

Apart from that, I just want to get the results to show one after another on the right instead. Also if we could somehow make the IF statement loop by adding each post code listed in column A to the URL on earh loop, that would be great too if its possible?

The code is basically an exact copy of the youtube tutorial by the way, just using our URL instead of course.

<code>
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("zipcode").Row And _
Target.Column = Range("zipcode").Column Then


Dim IE As New InternetExplorer
'IE.Visible = True
IE.navigate "http://google.com=" & Range("zipcode").Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document


Dim sDD As String
sDD = Trim(Doc.getElementsByTagName("h4")(0).innerText)


Dim sDE As String


sDE = Trim(Doc.getElementsByTagName("p")(0).innerText)


Dim aDD As Variant
aDD = Split(sDD, ",")
Range("county").Value = aDD(0)


Dim aDE As Variant
aDE = Split(sDE, ",")
Range("city").Value = aDE(0)


End If
End Sub
</code>

Thanks very much in advance for any help.
 
Last edited:
Upvote 0
Hi Matt

Here is the code :-
Code:
Private Sub Worksheet_Change(ByVal Target As Range)
'
' Module created by ukmikeb(MrExcel Forum) 13th March 2013
' Purpose to access URL - [URL="http://www.greenthumb.co.uk/"]www.greenthumb.co.uk[/URL] by Post Code entered in Cell A2
'
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim NodeList
Dim Elem
Dim X

' If Target is not A2 then ignore
If Target <> Range("zipcode") Then Exit Sub

' If nothing has been entered in A2 then ignore
If IsEmpty(Target) Or IsNull(Target) Then Exit Sub

'IE.Visible = True
IE.navigate "[url=http://www.greenthumb.co.uk/find-us-page?postcode]Find Us[/url]=" & Range("zipcode").Value
 
 Do
 DoEvents
 Loop Until IE.readyState = READYSTATE_COMPLETE
 
 Set Doc = IE.document
 
' Retrieve the Outlet from the h4 tag
Range("B2").Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)

X = 1

' Retrieve the Paragraph tags
Set NodeList = Doc.getElementsByTagName("p")
For Each Elem In NodeList
'
' The Paragraph info is located in paragraphs 1 through 6
' 1 and 2 contain the Address
' 3 and 4 contain the Telephone
' 5 and 6 contain the E-mail
    Select Case X
        Case Is = 2
            Range("C2").Value = Elem.innerText
        Case Is = 4
            Range("D2").Value = Elem.innerText
        Case Is = 6
            Range("E2").Value = Elem.innerText
    End Select
' Insert new row after input
Row(3).Insert
' Copy input to inserted row
Range("A2:E2").Copy Destination:=Range("A3")
' Clear input area
Range("A2:E2").ClearContents
' Autofit columns b through e
Columns("B:E").AutoFit
X = X + 1
Next Elem
End Sub

Code not tested.

I have made the following assumptions that :-
1, the input will always be in A2.
2, as the locations are entered then they will propogate from A3 down.
3, there is no check for an Error postcode entry

You should consider the following :-
1, The order of the outlets
2, How duplicate outlets should be handled

hth
 
Upvote 0
Hi Matt

Here is the code :-
Code:
Private Sub Worksheet_Change(ByVal Target As Range)
'
' Module created by ukmikeb(MrExcel Forum) 13th March 2013
' Purpose to access URL - [URL="http://www.greenthumb.co.uk/"]www.greenthumb.co.uk[/URL] by Post Code entered in Cell A2
'
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim NodeList
Dim Elem
Dim X

' If Target is not A2 then ignore
If Target <> Range("zipcode") Then Exit Sub

' If nothing has been entered in A2 then ignore
If IsEmpty(Target) Or IsNull(Target) Then Exit Sub

'IE.Visible = True
IE.navigate "[URL="http://www.greenthumb.co.uk/find-us-page?postcode"]Find Us[/URL]=" & Range("zipcode").Value
 
 Do
 DoEvents
 Loop Until IE.readyState = READYSTATE_COMPLETE
 
 Set Doc = IE.document
 
' Retrieve the Outlet from the h4 tag
Range("B2").Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)

X = 1

' Retrieve the Paragraph tags
Set NodeList = Doc.getElementsByTagName("p")
For Each Elem In NodeList
'
' The Paragraph info is located in paragraphs 1 through 6
' 1 and 2 contain the Address
' 3 and 4 contain the Telephone
' 5 and 6 contain the E-mail
    Select Case X
        Case Is = 2
            Range("C2").Value = Elem.innerText
        Case Is = 4
            Range("D2").Value = Elem.innerText
        Case Is = 6
            Range("E2").Value = Elem.innerText
    End Select
' Insert new row after input
Row(3).Insert
' Copy input to inserted row
Range("A2:E2").Copy Destination:=Range("A3")
' Clear input area
Range("A2:E2").ClearContents
' Autofit columns b through e
Columns("B:E").AutoFit
X = X + 1
Next Elem
End Sub

Code not tested.

I have made the following assumptions that :-
1, the input will always be in A2.
2, as the locations are entered then they will propogate from A3 down.
3, there is no check for an Error postcode entry

You should consider the following :-
1, The order of the outlets
2, How duplicate outlets should be handled

hth
 
Upvote 0
Thanks very much for your code. I really appreciate it.

It won't seem to run though. When I enter a post code into A2 it throws an error in saying Compile error User-defined type not defined. It then highlights the Dim IE As New InternetExplorer in blue.

Any ideas?

By the way, theres no need to worry about duplicates in this case and order is irrelevant too as well.

Is it possible to make it so that a code entered in any cell of the A column will trigger the code? Say if I entered ten post codes in cell A2:A11 it would run the code for each and put the results to the right of the same row (I'm guessing some sort of loop would be required here)

From spending 10 minutes looking at the code with your comments, I can kind of work out what's going on. But not quite there yet unfortunately.
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top