Scraping web works on same kind of webpage except one.

Certified

Board Regular
Joined
Jan 24, 2012
Messages
189
My company uses a webpage created by ASP.NET to upload data from difference sources (people). Employees from different regions input data on the webpage and I consolidate the data into a report.

I am writing a vba code to pull (scrap) info from the page to an Excel sheet.

The first part of the code creates the link and saves the website data into a variable (htmldoc).

Code:
Sub getdatafromsubmission(schedule As String)    
    Dim xmlPage As New MSXML2.XMLHTTP60
    Dim htmlDoc As New MSHTML.HTMLDocument
    Dim urlLink As String
  
    
    urlLink = "http://w7gs-5n1h9y1/AMGSubmission/" & schedule & "list.cshtml"
    Debug.Print urlLink
    
    
    xmlPage.Open "GET", urlLink, False
    xmlPage.send
    
    'creates a new html document from XMLPage.
    htmlDoc.body.innerHTML = xmlPage.responseText
    
    'THis sub process the html
    ProcessHTMLPage htmlDoc, schedule


End Sub


Next I call another sub to scrap the data and paste into an Excel Sheet.

Code:
Sub ProcessHTMLPage(htmlPage As MSHTML.HTMLDocument, schedule As String)

    'class name = table ewTable


    Dim htmlTable As MSHTML.IHTMLElement
    Dim htmltables As MSHTML.IHTMLElementCollection
    Dim htmlRow As MSHTML.IHTMLElement
    Dim htmlcell As MSHTML.IHTMLElement
    Dim rowNum As Long, colNum As Integer
    Dim x As Integer
    Dim y As Integer
        
    
    
    'Note: htmltable(s) is collection of elements not a single element.
    Set htmltables = htmlPage.getElementsByTagName("table")
    
    For Each htmlTable In htmltables
        'Debug.Print htmlTable.className
        
        x = 1
        For Each htmlRow In htmlTable.getElementsByTagName("tr")
            'Debug.Print vbTab & htmlRow.innertext
            
            y = 1
            '//This prints out any level below the "tr" element
            For Each htmlcell In htmlRow.Children
                'Debug.Print vbTab & htmlcell.innertext
                
                Sheet3.Activate
                
                Cells(x, y) = htmlcell.innertext
                
            y = y + 1
            Next htmlcell
        
        x = x + 1
        Next htmlRow
    
    Next htmlTable
End Sub

I use the same code for 4 different web pages all created by the same template.

As you can see above the code first finds the "table" tag name. before going further.

This is where my issue starts -

The codes works fine for the first 3 pages, but errors on the 4th. I seems like the code can't find the "table" tag name.

I check the html code in Chrome and the page does have a table id.

Does anyone have any idea or suggestion on what I can do?
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.

Forum statistics

Threads
1,223,893
Messages
6,175,248
Members
452,623
Latest member
cliftonhandyman

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top