VBA Image from HTML

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
840
Using the following URL in cell S2: https://www.tesco.com/groceries/en-GB/shop/fresh-food/fresh-fruit/bananas

I am trying to pull all the image Src URL's on the page.

Any ideas why this isn't quite correct please?

Thanks.

Code:
Sub Images()


Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim LastRow As Long
Dim Images As MSHTML.HTMLImgCollection
Dim Image As MSHTML.HTMLImg
Dim arrItems_1 As Variant
Dim StrURL As String
Dim rngURL As Range


'Create URL and sent request
For Each rngURL In Worksheets("Sheet1").Range("S2", Worksheets("Sheet1").Range("S" & Rows.Count).End(xlUp))
    XMLPage.Open "GET", rngURL, False
    XMLPage.send
    DoEvents
    
    'Get the source (code) of the webpage
    HTMLDoc.body.innerHTML = XMLPage.responseText
    
    LastRow = Sheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).Row + 1
    
    Set Images = HTMLDoc.getElementsByClassName("product-list-container")(0).getElementsByTagName("img")
    
    For Each Image In Images
    If Image.className = "product-image" Then
    StrURL = StrURL & "|" & "https://img.tesco.com/" & Image.src
        End If
    
Next


Next
    'Store all results in an Array


    arrItems_1 = Split(Mid(StrURL, 2), "|")
    
    'Insert the results directly into Sheet1


    Sheets("Sheet1").Cells(LastRow, 1).Resize(UBound(arrItems_1) + 1) = Application.Transpose(arrItems_1)
    
End Sub
 

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
Because they aren't in HTML when the page is loaded, they're referenced in JavaScript, but since your method doesn't execute the JavaScript, the images are never inserted.
 
Upvote 0
First, change these lines in the declaration section:
Code:
Dim Images As MSHTML.HTMLImgCollection
Dim Image As MSHTML.HTMLImg

To:
Code:
Dim Images As MSHTML.IHTMLElementCollection
Dim Image As MSHTML.IHTMLElement

Second, insert an if statement to check if rngUrl is empty:
Rich (BB code):
If rngURL.Text <> "" Then
        XMLPage.Open "GET", rngURL, False
        XMLPage.send
        DoEvents
       
        'Get the source (code) of the webpage
        HTMLDoc.body.innerHTML = XMLPage.responseText
       
        lastrow = Sheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).Row + 1
       
        Set Images = HTMLDoc.getElementsByClassName("product-list-container")(0).getElementsByTagName("img")
       
        For Each Image In Images
            If Image.className = "product-image" Then
                StrURL = StrURL & "|" & "https://img.tesco.com/" & Image.src
            End If
        Next
    End If
 
Last edited by a moderator:
Upvote 0
That produces the following URLs:

[TABLE="width: 64"]
<colgroup><col width="64" style="width:48pt"> </colgroup><tbody>[TR]
[TD="width: 64"]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==[/TD]
[/TR]
[TR]
[TD]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==[/TD]
[/TR]
[TR]
[TD]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==[/TD]
[/TR]
[TR]
[TD]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==[/TD]
[/TR]
[TR]
[TD]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==[/TD]
[/TR]
[TR]
[TD]https://img.tesco.com/data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

Which aren't actually correct. I managed to get this before.

What I am looking for is, for example:

https://img.tesco.com/Groceries/pi/093/0000010111093/IDShot_225x225.jpg[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
I noticed that also. But my first attempt was to get rid of the error messages I got.

I will have a look at it later. I'm working at the moment.
 
Upvote 0
I looked into it and agree with Kyle123.

Because they aren't in HTML when the page is loaded, they're referenced in JavaScript, but since your method doesn't execute the JavaScript, the images are never inserted.

So you get something like: "data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

This is a technique I'm not familiar with. It converts images to base64 data-URIs for embedding as image URLs in the HTML-code. Sorry I cannot help you any further.
 
Upvote 0

Forum statistics

Threads
1,224,821
Messages
6,181,163
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top