Capture href in scrape using getelementsbytag

diddi

Well-known Member
Joined
May 20, 2004
Messages
3,337
Office Version
  1. 2010
Platform
  1. Windows
Hi all. I have a scrape which works fine with the exception that i cannot work out how to capture the href data following "<a class="list__link" href=" in the html.
My sub:

VBA Code:
Sub ScrapeNow()
    Dim xmlhttp As XMLHTTP60
    Dim oDom As Object: Set oDom = CreateObject("htmlFile")
    Set xmlhttp = New MSXML2.ServerXMLHTTP60

    Dim objXML As MSXML2.DOMDocument60 'MSXML2.DOMDocument
    Dim ele As IHTMLElement

    Set objXML = New MSXML2.DOMDocument60
    prow = 2
    For row = 65 To 90
        For row2 = 1 To 100
            MyURL = "https://www.scenicrim.qld.gov.au/council-services/cemeteries/interment-database?itemsPerPage_356=1000&search_356_30=" & Chr(row) & "&search_356_3=&submit_356=FIND&page_356=" & row2
        
            Label1.Caption = "Waiting for " & Chr(row) & " Page " & row2
            DoEvents
            With xmlhttp
                .Open "GET", MyURL, False
                .send
                oDom.body.innerHTML = .responseText
                If InStr(.responseText, "no content available") > 0 Then Exit For
            End With
            
            With oDom.getElementsByTagName("table")(0)
                For Each oRow In .Rows
                    prow = prow + 1
                    col = 0
                    oDom.getElementsByClassName ("list__link")
                    
                    For Each oCell In oRow.Cells
                        col = col + 1
                        Cells(prow, col) = oCell.innerText
                    Next oCell
                Next oRow
            End With
        Next row2
    Next row
End Sub

sample tablerow html:

Rich (BB code):
<tr>
              <td scope="row">1</td>
                      <td scope="row">

 <a class="list__link" href="/directory-record/14156/aagaard-doris-susan">Aagaard</a> ' i want this link here

                                                                              </td>

<td scope="row">
Doris Susan
                                                      </td>
                                                  <td scope="row">
Mount Tamborine
                                                      </td>
                                                  <td scope="row">
31 July 1980
                                                      </td>
                                                  <td scope="row">
29 July 1980
                                                      </td>
                                                                  </tr>
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Hi,

I have not been here for a while so I am feeling a bit rusty.
However, I could not quite make your code work for me so, like the song, "I did it my way!"

I used your search criteria but have changed the rest so that it looks more familiar to me. Apologies if that causes you any confusion.

The code carries out a series of searches and keeps the data from your chosen class as one object which it then writes to the worksheet.
It then goes back and performs the next search and appends that to the worksheet.

I hope that will get you started.

VBA Code:
'Option Explicit

' Requires References:
'   Microsoft HTML Object Library
'   Microsoft XML, V6.0

Sub ScrapeNow()
    Dim xmlhttp     As XMLHTTP60
    Dim oDom        As HTMLDocument
    Dim prow        As Long
    Dim Row         As Long
    Dim row2        As Long
    Dim MyURL       As String
    Dim i           As Long
    Dim links       As Object
    
    Set oDom = New HTMLDocument
    Set xmlhttp = New MSXML2.ServerXMLHTTP60
    
    prow = 2
    For Row = 65 To 90
        MyURL = "https://www.scenicrim.qld.gov.au/council-services/cemeteries/interment-database?itemsPerPage_356=1000&search_356_30=" & Chr(Row) & "&search_356_3=&submit_356=FIND&page_356=" & row2

        With xmlhttp
            .Open "GET", MyURL, False
            .send
            Do: DoEvents: Loop Until .readyState = 4
            oDom.body.innerHTML = .responseText
        End With
        

        Set links = oDom.getElementsByClassName("list__link")
        
        For i = 0 To links.Length - 1
            Range("A1").Offset(prow, 0) = links(i).href
            prow = prow + 1
        Next i
            
    Next Row
End Sub
 
Upvote 0
Solution
thx. i am good with alternative ideas... the bit that is important is the bit you have helped with. the rest just enables a complete site rip, cycling throung the various extra pages etc. i shall have a try now and see if it works
 
Upvote 0
runs exactly as i intended now. thx so much
If you have any references to working with this type of code i would be very keen to learn more in this area. :)

VBA Code:
Sub ScrapeNow2()
    Dim xmlhttp As XMLHTTP60
    Dim oDom  As HTMLDocument
    Dim i           As Long
    Dim links       As Object
    Dim objXML As MSXML2.DOMDocument60
    Dim ele As IHTMLElement

    Set xmlhttp = New MSXML2.ServerXMLHTTP60
    Set objXML = New MSXML2.DOMDocument60
    Set oDom = New HTMLDocument
    
    prow = 2
    For Row = 65 To 90
        For row2 = 1 To 100
            MyURL = "https://www.scenicrim.qld.gov.au/council-services/cemeteries/interment-database?itemsPerPage_356=1000&search_356_30=" & Chr(Row) & "&search_356_3=&submit_356=FIND&page_356=" & row2
        
            'Label1.Caption = "Waiting for " & Chr(Row) & " Page " & row2  ' commented out so code will run without a userform progress window
            DoEvents
            With xmlhttp
                .Open "GET", MyURL, False
                .send
                oDom.body.innerHTML = .responseText
                If InStr(.responseText, "no content available") > 0 Then Exit For
            End With
            
            startrow = prow + 1
            With oDom.getElementsByTagName("table")(0)
                For Each oRow In .Rows
                    prow = prow + 1
                    col = 0
                    
                    For Each oCell In oRow.Cells
                        col = col + 1
                        Cells(prow, col) = oCell.innerText
                    Next oCell
                Next oRow
            End With

            Set links = oDom.getElementsByClassName("list__link")
            
            For i = 3 To links.Length - 1
                startrow = startrow + 1
                Cells(startrow, 8) = links(i).href
            Next i

        Next row2
    Next Row
End Sub
 
Upvote 0
OK, my brain is more in gear now. This solution could be neater:

VBA Code:
Sub ScrapeNow2()
    Dim xmlhttp As XMLHTTP60
    Dim oDom    As HTMLDocument
    Dim i       As Long
    Dim links   As Object
    Dim ele     As IHTMLElement

    Set xmlhttp = New MSXML2.ServerXMLHTTP60
    Set oDom = New HTMLDocument
    
    prow = 2
    For Row = 65 To 90
        For row2 = 1 To 100
            MyURL = "https://www.scenicrim.qld.gov.au/council-services/cemeteries/interment-database?itemsPerPage_356=1000&search_356_30=" & Chr(Row) & "&search_356_3=&submit_356=FIND&page_356=" & row2
        
            'Label1.Caption = "Waiting for " & Chr(Row) & " Page " & row2  ' commented out so code will run without a userform progress window
            DoEvents
            With xmlhttp
                .Open "GET", MyURL, False
                .send
                oDom.body.innerHTML = .responseText
                If InStr(.responseText, "no content available") > 0 Then Exit For
            End With
            
            startrow = prow + 1
            With oDom.getElementsByTagName("table")(0)
                For Each oRow In .Rows
                    prow = prow + 1
                    col = 0
                    
                    For Each oCell In oRow.Cells
                        col = col + 1
                        Cells(prow, col) = oCell.innerText
                        If col = 2 Then Cells(prow, 8) = oCell.getElementsByTagName("a")(0).href
                    Next oCell
                Next oRow
            End With

        Next row2
    Next Row
End Sub

Sorry, I have never found and useful references. However, if you can use a browser's developer facilities you can see the shape of the html. After that, it is just a question of beating your favourite search engine into submission!
 
Upvote 0
i tried nesting and had a complete fail. thx for getting me in the right direction here. the idea that there are 2 getelementsbytagname() methods they return different things has done my head in :(
 
Upvote 0
yes, i have been reading w3. i guess i am at a disadvantage in that i never learnt HTML. i have always used visual web design tools. thx for your help. i will probably need to come back for some more help if thats ok. :)
 
Upvote 0
The W3 Schools part about the Document Object Model is useful.
Also JavaScript is not a million miles from VBA so that could be useful as well. They have a section on JavaScript DOM.
Come back by all means but if it is a different problem then please open a new thread. That way more people will get to look at it and you stand a better chance of getting a reply because I am not here all the time.

The DOM is the way they structure a web page to make is accessible to programs. Here is one of their diagrams:
img_htmltree[1].gif
 
Upvote 0

Forum statistics

Threads
1,224,828
Messages
6,181,215
Members
453,024
Latest member
Wingit77

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top