Can't find META tags in HTML when using WinHTTP Request

Julesdude

Board Regular
Joined
Jan 24, 2010
Messages
197
Hi all,

I am trying to locate Meta tags from the following URL:

https://england.shelter.org.uk/prof...ary_folder/2020_group_-_fiscal_stimulus_paper

However, what is passed from the result string to HTMLDoc.body.all is only a section of the actual HTML and the meta tags at the beginning of the script are absent and so cannot be found. How do I overcome this?

my code is as follows:

Code:
Dim Http2 As New WinHttpRequest
Dim i as long
dim result as string
dim elements
dim element

Set HTMLDoc = New MSHTML.HTMLDocument



    Http2.Open "GET", url, False
    ' send request
    Http2.send
    result = Http2.responseText



    'pass text of HTML document returned
    HTMLDoc.body.all = result


    Set Elements = HTMLDoc.all.tags("META")


        For Each singleElement In Elements 


        ActiveSheet.Cells(i, "A") = url
        ActiveSheet.Cells(i, "B") = "META " & singleElement.Name
            
        ActiveSheet.Cells(i, "D").NumberFormat = "@" ' text format for date
        ActiveSheet.Cells(i, "D") = singleElement.Content
        
        
        i = i + 1
        
        Next 
    
    
    Set Elements = Nothing
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Meta tags are in at HTML documents head, not it's body. Not in a position to check but I think
Code:
HTMLDoc.all
will include Meta tags

Edit: Sorry I only read your introduction and not the code. You should have what you need but obviously don't! I'll try and look later when at an appropriate machine instead of posting unhelpful suggestions!
 
Last edited:
Upvote 0
Bit grotty, but gives you an idea:
Rich (BB code):
    Http2.Open "GET", Url, False
    ' send request
    Http2.send
    result = Http2.responseText

    'pass text of HTML document returned
    f = Split(result, "****** ")
    For x = LBound(f) + 1 To UBound(f) + 1
        On Error Resume Next
        Debug.Print Split(Split(f(x), "name=""")(1), """")(0), Split(Split(f(x), "content=""")(1), """")(0)
    Next x
 
Upvote 0
Meta tags are in at HTML documents head, not it's body. Not in a position to check but I think
Code:
HTMLDoc.all
will include Meta tags

Edit: Sorry I only read your introduction and not the code. You should have what you need but obviously don't! I'll try and look later when at an appropriate machine instead of posting unhelpful suggestions!

Hi Scott, thanks for helping out.

I actually did try that thinking it would work and I get a type mismatch error message when I try to pass the result string to it.

I'm declaring HTMLDoc as:

Code:
Dim HTMLDoc As MSHTML.HTMLDocument
Set HTMLDoc = New MSHTML.HTMLDocument


i also changed
Code:
HTMLDoc.body.all

'to

HTMLDoc.body.innerHTML

This parses ok but cuts out the meta tags.

Not sure if that interrupts the passing? But anyway, it's still not working :(
 
Last edited:
Upvote 0
Thanks Kyle,
When I try this I get an error - the f is highlighted 'can't assign to array'
How are you declaring the 'f'?
 
Upvote 0
A trick I discovered is that, if instead of early binding the HTMLDocument object and loading it with the usual method like this:

Code:
    Dim HTMLdoc As HTMLDocument
    Set HTMLdoc = New HTMLDocument
    HTMLdoc.body.innerHTML = WinHttpRequest.responseText

you late bind HTMLDocument and load the response with its Write method, then the HEAD tag, which includes META tags, is fully populated:

Code:
    Dim HTMLdoc As Object
    Set HTMLdoc = CreateObject("HTMLfile")
    HTMLdoc.Open
    HTMLdoc.Write WinHttpRequest.responseText     
    HTMLdoc.Close
In your case the above Write line should be HTMLdoc.Write Http2.responseText
 
Upvote 0
The only issue with that though John is that you lose the good stuff like getting elements by class name etc and the more modern stuff. Though in this case it's very unlikely to matter
 
Upvote 0
John, thank you. This works perfectly.
Will this capture all meta tags though. For example, ones like:
Code:
meta name = "...." content= "....">
meta property = "...." content = "....">


I would want to capture both elements/tags within.

Here's my code in full with your additions:

Code:
    Dim HTMLdoc As Object
   
 Set HTMLdoc = CreateObject("HTMLfile")
    HTMLdoc.Open
    HTMLdoc.Write Http2.responseText
  
    Set Elements = HTMLdoc.all.tags("META")
    
    
    ' add a new worksheet for the webpage and provide headers
    ActiveWorkbook.Sheets.Add
    Cells(1, 1) = "URL"
    Cells(1, 2) = "TAG"
    Cells(1, 3) = "LINK"
    Cells(1, 4) = "TEXT"
    
    i = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row + 1 ' find the last row


        For Each singleElement In Elements ' new
        ActiveSheet.Cells(i, "A") = url
        ActiveSheet.Cells(i, "B") = "META " & singleElement.Name
            
        ActiveSheet.Cells(i, "D").NumberFormat = "@" ' text format for date
        ActiveSheet.Cells(i, "D") = singleElement.Content
        
        
        i = i + 1
        
        Next ' new
 HTMLdoc.Close
 
Last edited:
Upvote 0
Correct Kyle, so declare two HTMLDocuments, one late bound to get the META elements and the other early bound for getElementsByClassName, etc!
 
Upvote 0

Forum statistics

Threads
1,223,702
Messages
6,173,932
Members
452,539
Latest member
delvey

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top