Importing Web Page HTML elements text using VBA

mattu1990

Board Regular
Joined
Mar 9, 2013
Messages
52
Hi,

I'm new to VBA (and to this forum) and was wondering if someone could help me with a problem getting some web page data into Excel 2010 using VBA.

Basically the scenario I am trying to create is very similar to this YouTube tutorial. Excel VBA Pull Data From A Website - YouTube

My scenario however is set up with the following titles in cell A1, B1, C1, D1 and E1 : POST CODE, OUTLET, ADDRESS, TELEPHONE, EMAIL

The result I want to achieve is I enter a post code into cell A2 for example, Excel then uses IE to navigate to the relevant web page as defined in the VBA code. I then want the following to happen:


  • The InnerText of the web page's h1 tag is then inserted into the OUTLET cell (B2)
  • The first instance of the p tag is then inserted into the ADDRESS cell (C2)
  • The second instance of the p tag is then inserted into the TELEPHONE cell (D2)
  • The third instance of the p tag is then inserted into the EMAIL cell (E2)

All instances of the p tag are contained in a div element called div class="adBox_content" . There are also 5 other DIVs above that DIV in the hierarchy (whether that helps or not I don't know?)

Using the YouTube tutorial link, the method has worked for me using the getElementsByTagName("h1").innerText However, when I try adding a second getElementsByTagName("p")(01).innerText the whole thing fails.

So I'm left with two problems; I can't make the VBA get more than one element at a time from the page, I can only either have the h1 or the first instance of the p tag. I've tried all the getElementBy methods and none of them seem to work in getting the second and third
instances to show.


I also need the code to make the data be put on the same row ONLY as where the post code was entered. In this scenario for example of entering a post code into A2, the OUTLET needs to land in cell B2 only, ADDRESS C3 only etc.

By following the youtube tutorial above by giving the cells names to refer to in the code, the data ends up being inputted in all further rows with identical cell names. I need it to not do that.

Any help would be hugely appreciated. The code is needed for around 300 rows of post codes that will be entered and refreshed every week or so.

Thanks,
Matt
 
Last edited:
My mistake regarding the code not working. It was because I dumped it into a new project without inserting the Microsoft Internet Controls and Microsoft HTML Object Library.

Its now showing a different error instead though. "Sub or Function not defined" It then highlights in blue the Row(3).Insert line of code.
 
Upvote 0

Excel Facts

Excel Wisdom
Using a mouse in Excel is the work equivalent of wearing a lanyard when you first get to college
Hi Matt

Revised and tested code:-
Code:
Private Sub Worksheet_Change(ByVal Target As Range)
'
' Module created by ukmikeb(MrExcel Forum) 13th March 2013
' Purpose to access URL - [URL="http://www.greenthumb.co.uk"]www.greenthumb.co.uk[/URL] by Post Code entered in any cell column A
'
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim NodeList
Dim Elem
Dim X
Dim Tgtrw As Long
If Target.Column <> 1 Or Target.Row < 2 Then
   Exit Sub
End If
Tgtrw = Target.Row
' If nothing has been entered in A2 then ignore
If IsEmpty(Target.Value) Or IsNull(Target.Value) Then Exit Sub
'IE.Visible = True
IE.navigate "[url=http://www.greenthumb.co.uk/find-us-page?postcode]Find Us[/url]=" & Range("A" & Tgtrw).Value
 
 Do
 DoEvents
 Loop Until IE.readyState = READYSTATE_COMPLETE
 
 Set Doc = IE.document
 
Application.EnableEvents = False
' Retrieve the Outlet from the h4 tag
Range("B" & Tgtrw).Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)
X = 1
' Retrieve the Paragraph tags
Set NodeList = Doc.getElementsByTagName("p")
For Each Elem In NodeList
'
' The Paragraph info is located in paragraphs 1 through 6
' 1 and 2 contain the Address
' 3 and 4 contain the Telephone
' 5 and 6 contain the E-mail
    Select Case X
        Case Is = 2
            If Elem.innerText = "Just fill in your details" Then
                Range("B" & Tgtrw).Value = "Please enter a valid Post Code"
                Application.EnableEvents = False
                Exit Sub
            Else
                Range("C" & Tgtrw).Value = Elem.innerText
            End If
        Case Is = 4
            Range("D" & Tgtrw).Value = Elem.innerText
        Case Is = 6
            Range("E" & Tgtrw).Value = Elem.innerText
    End Select
X = X + 1
Next Elem
Application.EnableEvents = True
IE.Quit
Set IE = Nothing
' Set columns b through e column width to 32
Columns("B:E").ColumnWidth =32
End Sub

I don't know why the "Dim IE etc" statement us erroring.

The code will accept entry in any cell in column A other than A1.

As it is a Worksheet_Change event it will automatically run on any change to that column.

hth
 
Upvote 0
Thank you so much Mike, the script is amazing and works as exactly as I wanted. I really can't thank you enough as you have probably increased our productivity by about a million percent! lol.

I was going to say one small problem I noticed is that when I copy and paste say 10 codes into column A, it won't load the data. I'm guessing this is a limitation of Excel only being able to identify one changed cell at a time.

A quick workaround though I realised. I simply dumped my 10 codes as an example into column F and then in column A, the formula =F2 and so on. Seems to cure that problem. Then simply copied and pasted the formula downwards.

I'll let you know if any other problems occur, but this is great so thanks again.

Kind Regards,
Matt
 
Upvote 0
Hi Matt.

Thanks for the feedback.

There is something I forgot to change and include, which could cause a problem.
It is in the section relating to an invalid post code and change should read as follows:-
Code:
           If Elem.innerText = "Just fill in your details" Then
              Range("B" & Tgtrw).Value = "Please enter a valid Post Code"
              Application.EnableEvents = [COLOR=#ff0000]True
[/COLOR]               IE.Quit
              Set IE = Nothing
              Exit Sub

replacing the code
Code:
          If Elem.innerText = "Just fill in your details" Then
                Range("B" & Tgtrw).Value = "Please enter a valid Post Code"
                Application.EnableEvents = False
                Exit Sub


I am pleased to have helped you solve your problem and that has also helped me solve a problem too.

Good luck with your project.
 
Last edited:
Upvote 0
Hi Mike,

I was wondering if you (or anyone else on here) can help us to enhance this script a bit to make it a little more automated.

It's been great for us to use the last few months but now we have gone from having to manage about 300 post codes to almost 600.

At the moment, when we want to do a refresh of all the codes, we have to manually delete out all the codes in column A and then re-enter them to trigger new requests to the website.

We don't mind having to delete them, it's putting them all back in again that's a pain because as I mentioned above in March, you have to do them one at a time, otherwise Excel can't capture them quick enough and the application fails.

I was wondering if you know of a way to make Excel look at our new list of codes and go through them one by one by itself, getting the relevant data. I assume some sort of statement would have to be written to tell Excel not to move onto the next code until it has finished loading the data from the current one.

My aim would then be to attach that script to a big button on the worksheet as a macro so we could automatically update them as and when needed.

Any help would be very much appreciated (again!)

kind regards,
Matt
 
Upvote 0
Hi Matt

Probably the best thing to do is :-
1, to remove the Internet Access code from the Worksheet_Change module and create a standalone module.
2, Call the standalone module from the Worksheet_Change module and
3, Create a module for the Button_Click which also Calls the standalone module and processes all the postcodes. Be sure to create a backup sheet in case the process fails.

hth
 
Upvote 0
Hi Mike,

Thanks, but I'm a bit confused. Are saying ditch the current script and make a new one that doesn't wait for a worksheet_change?

Also how would I go about calling the module from the worksheet_change sheet.

I'm guessing then a third separate module for the button_click?

Can you help with any bits of code at all?.

Thanks,
Matt
 
Upvote 0
Hi Matt

Apologies for the delay getting back to you, I have been away for a few days.

I haven't tested these but here are the revised codes :-
1, Worksheet_Change
Code:
Private Sub Worksheet_Change(ByVal Target As Range)
'
' Module created by ukmikeb(MrExcel Forum) 13th March 2013, revised 21st August 2013
' Purpose to access URL - www.greenthumb.co.uk by Post Code entered in any cell column A calling GTWebAccess module
'
Dim Tgtrw As Long
If Target.Column <> 1 Or Target.Row < 2 Then
   Exit Sub
End If
' If nothing has been entered in A2 then ignore
If IsEmpty(Target.Value) Or IsNull(Target.Value) Then Exit Sub

Tgtrw = Target.Row

GTWebAccess Tgtrw

End Sub

2, WebAccess module
Code:
Sub GTWebAccess(ByVal Tgtrw as Long)
'
' Module created by ukmikeb(MrExcel Forum) 21st August 2013
' Purpose to access URL - www.greenthumb.co.uk by Post Code entered in any cell column A
' Passed info - Tgtrw
'
Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim NodeList
Dim Elem
Dim X
'IE.Visible = True
IE.navigate "Find Us=" & Range("A" & Tgtrw).Value
 
 Do
 DoEvents
 Loop Until IE.readyState = READYSTATE_COMPLETE
 
 Set Doc = IE.document
 
Application.EnableEvents = False
' Retrieve the Outlet from the h4 tag
Range("B" & Tgtrw).Value = Trim(Doc.getElementsByTagName("h4")(0).innerText)
X = 1
' Retrieve the Paragraph tags
Set NodeList = Doc.getElementsByTagName("p")
For Each Elem In NodeList
'
' The Paragraph info is located in paragraphs 1 through 6
' 1 and 2 contain the Address
' 3 and 4 contain the Telephone
' 5 and 6 contain the E-mail
    Select Case X
        Case Is = 2
           If Elem.innerText = "Just fill in your details" Then
               Range("B" & Tgtrw).Value = "Please enter a valid Post Code"
               Application.EnableEvents = True
               IE.Quit
               Set IE = Nothing
               Exit Sub
            Else
                Range("C" & Tgtrw).Value = Elem.innerText
            End If
        Case Is = 4
            Range("D" & Tgtrw).Value = Elem.innerText
        Case Is = 6
            Range("E" & Tgtrw).Value = Elem.innerText
    End Select
X = X + 1
Next Elem
Application.EnableEvents = True
IE.Quit
Set IE = Nothing
' Set columns B through E column width to 32
Columns("B:E").ColumnWidth =32
End Sub

3. Button module
Code:
Private Sub Button_Click()
'
' Module created by ukmikeb(MrExcel Forum) 21st August 2013
' Purpose to access URL - www.greenthumb.co.uk by Post Code entered in any cell column A via module GTWebAccess
'
Dim LR As Long
Dim Tgtrw As Long

' Insert Sheet Copy and return to Active Sheet code here

LR =Range("A" & Rows.Count).end(xlUp).Row
For Tgtrw = 3 to LR
GTWebAccess Tgtrw 
Next Tgtrw

' Insert Code to delete copy sheet and save workbook here
End Sub

I am not familiar with setting up Buttons and their associated modules.

hth
 
Upvote 0
Hi Mike,

No need to apologise, I appreciate the time you have spent helping me already.

I've tried the code but can't seem to get it working. It's a bit overwhelming. I'm not sure if I am placing any of it in the wrong place. Do you have it loaded in an Excel file you could send me?

By the way, do I still need the Microsoft HTML Object Library and Internet Controls references inserted into the file like with the last method?

Thanks,
Matt
 
Upvote 0
Apolgies I am stupid. I forgot to add the URL in the IE.navigate. I'm now at the stage of where we were with the original script.

Just struggling to get the Button_Click script working. So far I have put it in its own Macro. For some reason when I go to macros it doesn;t show there. So instead I go into Visual Editor and press the play button to run it.

It comes up with error "Compile error" "Sub or Function not defined" highlighting "GTWebAccess Tgtrw" in the code.
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,327
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top