URL checking that Hyperlink.SubAddress loads (VBA)

Deutz

Board Regular
Joined
Nov 30, 2009
Messages
196
Office Version
  1. 365
Platform
  1. Windows
Hi All,

I have spent some time looking for code that will check a list of URLs and return 'OK' if they load successfully and 'FAILED' if not. The code below does that but there does not seem to be a way to test the Hyperlink.SubAddress (the position to go to on the web page) which some URLs have, as in the example below: #Usethecalc The code simply ignores the SubAddress, no matter what it is (all characters to the right of a hash symbol) and just checks the Hyperlink.Address itself.

VBA Code:
Sub TestURLs()
    Dim strURL As String
    Dim oURL As Object
    Dim blnTest As Boolean
  
    strURL = "https://www.ato.gov.au/calculators-and-tools/simple-tax-calculator/?=top_10_calculators#Usethecalc"
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        .Open "HEAD", strURL, False
        .Send
        blnTest = .Status = 200
        If blnTest = False Then
            Debug.Print strURL & " FAILED"
        Else
            Debug.Print strURL & " OK"
        End If
        Set oURL = Nothing
    End With
End Sub


Thanks kindly
Deutz
 

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
You could load the web page into the HTML DOM and see if the anchor (the text after #) exists. This needs a reference to MS HTML Object Library - it won't work with late binding.

VBA Code:
Sub TestURLs()
    Dim strURL As String
    Dim oURL As Object
    Dim blnTest As Boolean
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
    
    strURL = "https://www.ato.gov.au/calculators-and-tools/simple-tax-calculator/?=top_10_calculators#Usethecalc"
    
    p = InStrRev(strURL, "/")
    p = InStr(p, strURL, "#")
    If p > 0 Then
        anchor = Mid(strURL, p + 1)
    Else
        anchor = ""
    End If
      
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", strURL, False
            .send
            blnTest = .Status = 200
        Else
            .Open "GET", strURL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            blnTest = (anchorHTML <> "")
        End If
        
        If blnTest = False Then
            Debug.Print strURL & " FAILED"
        Else
            Debug.Print strURL & " OK"
        End If
        Set oURL = Nothing
    End With
End Sub
 
Upvote 1
You could load the web page into the HTML DOM and see if the anchor (the text after #) exists. This needs a reference to MS HTML Object Library - it won't work with late binding.

VBA Code:
Sub TestURLs()
    Dim strURL As String
    Dim oURL As Object
    Dim blnTest As Boolean
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
   
    strURL = "https://www.ato.gov.au/calculators-and-tools/simple-tax-calculator/?=top_10_calculators#Usethecalc"
   
    p = InStrRev(strURL, "/")
    p = InStr(p, strURL, "#")
    If p > 0 Then
        anchor = Mid(strURL, p + 1)
    Else
        anchor = ""
    End If
     
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", strURL, False
            .send
            blnTest = .Status = 200
        Else
            .Open "GET", strURL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            blnTest = (anchorHTML <> "")
        End If
       
        If blnTest = False Then
            Debug.Print strURL & " FAILED"
        Else
            Debug.Print strURL & " OK"
        End If
        Set oURL = Nothing
    End With
End Sub
Thanks so much John ... this is exactly what I was looking for. Works as required. Much appreciated :)
 
Upvote 0
You could load the web page into the HTML DOM and see if the anchor (the text after #) exists. This needs a reference to MS HTML Object Library - it won't work with late binding.

VBA Code:
Sub TestURLs()
    Dim strURL As String
    Dim oURL As Object
    Dim blnTest As Boolean
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
   
    strURL = "https://www.ato.gov.au/calculators-and-tools/simple-tax-calculator/?=top_10_calculators#Usethecalc"
   
    p = InStrRev(strURL, "/")
    p = InStr(p, strURL, "#")
    If p > 0 Then
        anchor = Mid(strURL, p + 1)
    Else
        anchor = ""
    End If
     
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", strURL, False
            .send
            blnTest = .Status = 200
        Else
            .Open "GET", strURL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            blnTest = (anchorHTML <> "")
        End If
       
        If blnTest = False Then
            Debug.Print strURL & " FAILED"
        Else
            Debug.Print strURL & " OK"
        End If
        Set oURL = Nothing
    End With
End Sub

Hi John, I have implimented your code and it works in some instances but have come across a few URLs with fragments that are flagged as Failed when the page loads ok in the browser and moves to the location/anchor as expected. If I could please get you to take a look and advise.

Here are some that Failed unexpectedly:

Found this in the source: <h5 class="notoc"><a id="Noncommercialrental" class="anchor"></a>Non-commercial rental</h5>

Found this in the source: <h3><a id="Coownershipofrentalproperty" class="anchor"></a>Co-ownership of rental property</h3>

Found this in the source: <h4 class="notoc">Expenses prior to property being genuinely available for rent</h4>

Here are some that passed as expected:

Found this in the source: <h2><a id="Capitalallowances" class="anchor"></a>Capital allowances</h2>

Found this in the source: <h2><a id="Prepaidexpenses" class="anchor"></a>Pre-paid expenses</h2>

Thanks for you help
Michael D
 
Upvote 0
I've found a solution. If you request the URL without the anchor part then the response contains the whole HTML, including the anchor tag.

For example, request
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4
instead of
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental

I've changed the code to a function which removes the anchor part if the URL contains an anchor and returns True or False. This now returns the expected results.

VBA Code:
Public Sub Test_URLs()

    Dim URL As String
    
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental"
    Debug.Print URL & " " & Check_URL(URL)
    
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Typesofrentalexpenses1"
    Debug.Print URL & " " & Check_URL(URL)
    
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=3#Co_ownership_of_rental_property"
    Debug.Print URL & " " & Check_URL(URL)
    
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Expenses_prior_to_property_being_genuinely_available_for_rent"
    Debug.Print URL & " " & Check_URL(URL)
    
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-claim-over-several-years/#Capitalallowances"
    Debug.Print URL & " " & Check_URL(URL)
    
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#Prepaidexpenses"
    Debug.Print URL & " " & Check_URL(URL)

    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#AnchorDoesntExist"
    Debug.Print URL & " " & Check_URL(URL)

End Sub


Public Function Check_URL(ByVal URL As String) As Boolean

    Dim oURL As Object
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
    
    p = InStrRev(URL, "/")
    p = InStr(p, URL, "#")
    If p > 0 Then
        anchor = Mid(URL, p + 1)
        URL = Left(URL, p - 1)
    Else
        anchor = ""
    End If
      
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", URL, False
            .send
            Check_URL = (.Status = 200)
        Else
            .Open "GET", URL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            Check_URL = (anchorHTML <> "")
        End If
    End With
    
    Set oURL = Nothing
    
End Function
 
Upvote 1
I've found a solution. If you request the URL without the anchor part then the response contains the whole HTML, including the anchor tag.

For example, request
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4
instead of
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental

I've changed the code to a function which removes the anchor part if the URL contains an anchor and returns True or False. This now returns the expected results.

VBA Code:
Public Sub Test_URLs()

    Dim URL As String
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Typesofrentalexpenses1"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=3#Co_ownership_of_rental_property"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Expenses_prior_to_property_being_genuinely_available_for_rent"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-claim-over-several-years/#Capitalallowances"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#Prepaidexpenses"
    Debug.Print URL & " " & Check_URL(URL)

    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#AnchorDoesntExist"
    Debug.Print URL & " " & Check_URL(URL)

End Sub


Public Function Check_URL(ByVal URL As String) As Boolean

    Dim oURL As Object
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
   
    p = InStrRev(URL, "/")
    p = InStr(p, URL, "#")
    If p > 0 Then
        anchor = Mid(URL, p + 1)
        URL = Left(URL, p - 1)
    Else
        anchor = ""
    End If
     
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", URL, False
            .send
            Check_URL = (.Status = 200)
        Else
            .Open "GET", URL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            Check_URL = (anchorHTML <> "")
        End If
    End With
   
    Set oURL = Nothing
   
End Function
Hi John,

Thanks for your prompt reply. This works perfectly.

Very much appreciated

Cheers
Michael D
 
Upvote 0
I've found a solution. If you request the URL without the anchor part then the response contains the whole HTML, including the anchor tag.

For example, request
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4
instead of
HTML:
https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental

I've changed the code to a function which removes the anchor part if the URL contains an anchor and returns True or False. This now returns the expected results.

VBA Code:
Public Sub Test_URLs()

    Dim URL As String
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Noncommercialrental"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Typesofrentalexpenses1"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=3#Co_ownership_of_rental_property"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/forms/Rental-properties-2023/?page=4#Expenses_prior_to_property_being_genuinely_available_for_rent"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-claim-over-several-years/#Capitalallowances"
    Debug.Print URL & " " & Check_URL(URL)
   
    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#Prepaidexpenses"
    Debug.Print URL & " " & Check_URL(URL)

    URL = "https://www.ato.gov.au/Individuals/Investments-and-assets/Residential-rental-properties/Rental-expenses-to-claim/Rental-expenses-you-can-claim-now/#AnchorDoesntExist"
    Debug.Print URL & " " & Check_URL(URL)

End Sub


Public Function Check_URL(ByVal URL As String) As Boolean

    Dim oURL As Object
    Dim HTMLdoc As HTMLDocument
    Dim p As Long, anchor As String, anchorHTML As String
   
    p = InStrRev(URL, "/")
    p = InStr(p, URL, "#")
    If p > 0 Then
        anchor = Mid(URL, p + 1)
        URL = Left(URL, p - 1)
    Else
        anchor = ""
    End If
     
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", URL, False
            .send
            Check_URL = (.Status = 200)
        Else
            .Open "GET", URL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            Check_URL = (anchorHTML <> "")
        End If
    End With
   
    Set oURL = Nothing
   
End Function

Hi John,

Thanks again for all your help. I have run this for many links which check out exactly as expected. Sorry to bother you again with this but I have just two links remaining that are flagged as failing and yet I can see the anchor on the web page via view source.



<h2><a id="Worksheet" class="anchor"></a>Worksheet</h2>



<h3><a id="Project_pools" class="anchor"></a>Project pools</h3>


John, are you able to recommend any resources that would help me to become more familiar with the workings of HTML DOC so I don't have to bother others with this?

Thanks once again
Michael D
 
Upvote 0
Sorry to bother you again with this but I have just two links remaining that are flagged as failing and yet I can see the anchor on the web page via view source.

I don't know why those two anchors aren't found in the anchors collection when the item is accessed directly by its name - anchorHTML = HTMLdoc.anchors(anchor).outerHTML.

If the anchor isn't found by the direct check, this code loops through the anchors collection and checks whether the item's name or id matches the required anchor. It works for your two new URLs.

VBA Code:
Public Function Check_URL(ByVal URL As String) As Boolean

    Dim oURL As Object
    Dim HTMLdoc As HTMLDocument
    Dim thisAnchor As HTMLAnchorElement
    Dim p As Long, anchor As String, anchorHTML As String
    
    p = InStrRev(URL, "/")
    p = InStr(p, URL, "#")
    If p > 0 Then
        anchor = Mid(URL, p + 1)
        URL = Left(URL, p - 1)
    Else
        anchor = ""
    End If
      
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", URL, False
            .send
            Check_URL = (.Status = 200)
        Else
            .Open "GET", URL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
            
            'See if anchor exists by reading anchors collection directly by item name
            
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
            
            'If not found, see if anchor exists by looping through anchors collection
            
            If anchorHTML = "" Then
                For Each thisAnchor In HTMLdoc.anchors
                    If StrComp(anchor, thisAnchor.Name, vbTextCompare) = 0 Or StrComp(anchor, thisAnchor.ID, vbTextCompare) = 0 Then
                        anchorHTML = thisAnchor.outerHTML
                    End If
                Next
            End If
            
            Check_URL = (anchorHTML <> "")
        End If
    End With
    
    Set oURL = Nothing
    
End Function

John, are you able to recommend any resources that would help me to become more familiar with the workings of HTML DOC so I don't have to bother others with this?

Here are some references:

 
Upvote 1
I don't know why those two anchors aren't found in the anchors collection when the item is accessed directly by its name - anchorHTML = HTMLdoc.anchors(anchor).outerHTML.

If the anchor isn't found by the direct check, this code loops through the anchors collection and checks whether the item's name or id matches the required anchor. It works for your two new URLs.

VBA Code:
Public Function Check_URL(ByVal URL As String) As Boolean

    Dim oURL As Object
    Dim HTMLdoc As HTMLDocument
    Dim thisAnchor As HTMLAnchorElement
    Dim p As Long, anchor As String, anchorHTML As String
   
    p = InStrRev(URL, "/")
    p = InStr(p, URL, "#")
    If p > 0 Then
        anchor = Mid(URL, p + 1)
        URL = Left(URL, p - 1)
    Else
        anchor = ""
    End If
     
    Set oURL = CreateObject("MSXML2.XMLHTTP")

    With oURL
        If anchor = "" Then
            .Open "HEAD", URL, False
            .send
            Check_URL = (.Status = 200)
        Else
            .Open "GET", URL, False
            .send
            Set HTMLdoc = New HTMLDocument
            HTMLdoc.body.innerHTML = .responseText
           
            'See if anchor exists by reading anchors collection directly by item name
           
            anchorHTML = ""
            On Error Resume Next
            anchorHTML = HTMLdoc.anchors(anchor).outerHTML
            On Error GoTo 0
           
            'If not found, see if anchor exists by looping through anchors collection
           
            If anchorHTML = "" Then
                For Each thisAnchor In HTMLdoc.anchors
                    If StrComp(anchor, thisAnchor.Name, vbTextCompare) = 0 Or StrComp(anchor, thisAnchor.ID, vbTextCompare) = 0 Then
                        anchorHTML = thisAnchor.outerHTML
                    End If
                Next
            End If
           
            Check_URL = (anchorHTML <> "")
        End If
    End With
   
    Set oURL = Nothing
   
End Function



Here are some references:


Thanks again John. I did think to loop through the collection with an INSTR search of the anchor but was not sure exactly how to do that with the HTML objects.

Your prompt quality responses are much appreciated.

Kind regards
Michael D
 
Upvote 0
Hi John,

Hope you are well.

I would like to request your help please. I have been using this URL checking code that you assisted me with and it has been operating really well up until recently with some links failing the test. A few anchors are not being found in the web pages by the code, don't know if that is due to the website being redeveloped recently. The fragments that fail, all begin with #ato-

Have been mucking about with the code for a while, but to no avail.

Here are a couple that fail despite the URL fragment existing as a HTML page anchor ...



Thanks in advance John
 
Upvote 0

Forum statistics

Threads
1,223,904
Messages
6,175,295
Members
452,631
Latest member
a_potato

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top