Read nested table inside an html without class/id

jonsey

New Member
Joined
Jun 1, 2018
Messages
11
Hi guys,
i've got a problem with a scraping of a table from a website. Unfortunally i don't have an id to locate in a second the table in the web page, but this is located in another table that i can identify. I can post the html code here below :


Code:
< table id="addreg" class="display dataTable" cellspacing="0" style="width: 98%;" summary="Dettaglio procedimenti" role="grid" aria-describedby="addreg_info">
    < thead>
        < tr role="row">
            < th class="sorting_desc" tabindex="0" aria-controls="addreg" rowspan="1" colspan="1" aria-label=" Data pubblicazione : activate to sort column ascending" aria-sort="descending" style="width: 127px;"> Data pubblicazione < /th>
            < th class="details-control sorting_disabled" rowspan="1" colspan="1" aria-label=" Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)" style="width: 628px;"> Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)< /th>
        < /tr>
    < /thead>
    < tfoot>
        < tr>
            < th rowspan="1" colspan="1"> Data pubblicazione < /th>
            < th class="details-control" rowspan="1" colspan="1"> Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)< /th>
        < /tr>
    < /tfoot>


    < tbody>


        < tr role="row" class="odd shown">
            < td class="sorting_1">23-GEN-18< /td>
            < td class=" details-control">< /td>
        < /tr>
        < tr>
            < td colspan="2">
                < table cellpadding="5" cellspacing="0" border="0" width="98%">
                    < tbody>
                        < tr style="width:50px;">
                            < th style="text-align:right;">Aliquota< /th>
                            < th>Fascia di applicazione< /th>
                        < /tr>
                        < tr>
                            < td align="right">1.42
                                < hr>1.43
                                < hr>1.68
                                < hr>1.72
                                < hr>1.73< /td>
                            < td>fino a 15000.00 euro
                                < hr>oltre 15000.00 e fino a 28000.00 euro
                                < hr>oltre 28000.00 e fino a 55000.00 euro
                                < hr>oltre 55000.00 e fino a 75000.00 euro
                                < hr>oltre 75000.00 euro< /td>
                        < /tr>
                    < /tbody>
                < /table>
                < table cellpadding="5" cellspacing="0" border="0" width="98% style=" padding-left:100px; "=" ">< tbody>< tr>< th>Disposizioni particolari< /th>< /tr>< tr>< td> < /td>< /tr>< tr>< th>Norme di riferimento< /th>< /tr>< tr>< td>ART.4 L.R. 77/2012< /td>< /tr>< tr>< th>Note< /th>< /tr>< tr>< td> < /td>< /tr>< /tbody>< /table>< /td>< /tr>< /tbody>
        < /table>

What i need to read are the value in the nested table inside the main one, the table where are placed the value : 1.42,1.43,1.68,1.72,1.73.
I can read the table with id "addreg", but cant "navigate" in the nested table.
I've read the first table with the following code :


Code:
Sub test()
    
    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLdoc As MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLTables As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    
    IE.Visible = False
    IE.navigate "http://www1.finanze.gov.it/finanze2/dipartimentopolitichefiscali/fiscalitalocale/addregirpef/addregirpef.php?reg=17&anno=2018"
    
    'Do While IE.Busy = True Or IE.readyState <> 4: DoEvents: Loop
'    Do While IE.readyState <> READYSTATE_COMPLETE
'    Loop


    Application.Wait (Now + TimeValue("0:00:2"))
    
    Set HTMLdoc = IE.document
    Set HTMLTables = HTMLdoc.getElementsByTagName("table")
    
    'Debug.Print HTMLTables.Length
    
    For Each HTMLTable In HTMLTables
            'Debug.Print HTMLTable.className


            For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
                'Debug.Print vbTab & HTMLRow.innerText


                    For Each HTMLCell In HTMLRow.getElementsByTagName("td")
                        Debug.Print vbTab & HTMLCell.innerText


                    Next HTMLCell


            Next HTMLRow
    Next HTMLTable


   'Debug.Print HTMLTables(0).getElementsByTagName("tr").innerText
    
    IE.Quit
End Sub


Can someone help me with this problem?There is a procedure to parse a table from html also if i don't have any id/class to identify?
Is possible to navigate the chield of the table and read the table inside a child?
Thanks a lot in advance!!!
 
Hi, i've finally finished!
Thanks a lot John!I post here the code if can be usefull :)

Code:
Sub test()
    
    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLdoc As MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLTables As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    Dim FasciaArray() As String
    Dim AliquotaArray() As String
    
    
    IE.Visible = True
    IE.navigate "http://www1.finanze.gov.it/finanze2/dipartimentopolitichefiscali/fiscalitalocale/addregirpef/addregirpef.php?reg=17&anno=2018"
    
    'Do While IE.Busy = True Or IE.readyState <> 4: DoEvents: Loop
'    Do While IE.readyState <> READYSTATE_COMPLETE
'    Loop


    Application.Wait (Now + TimeValue("0:00:2"))
    
    
    Set HTMLdoc = IE.document


    Set HTMLTable = HTMLdoc.getElementById("addreg")
    Set HTMLTable = HTMLdoc.getElementsByTagName("TABLE")(0)
    For Each HTMLTableRow In HTMLTable.Rows
        For Each HTMLTableCell In HTMLTableRow.Cells
            HTMLTableCell.Click
        Next
    Next
    
    Set HTMLTable = HTMLdoc.getElementById("addreg")
    Set HTMLTable = HTMLdoc.getElementsByTagName("TABLE")(1)


    FasciaArray = Split(HTMLTable.Rows(1).Cells(1).innerText, vbLf)
    AliquotaArray = Split(HTMLTable.Rows(1).Cells(0).innerText, vbLf)
        
    Range("A1:A" & UBound(FasciaArray) + 1) = WorksheetFunction.Transpose(FasciaArray)
    Range("B1:B" & UBound(AliquotaArray) + 1) = WorksheetFunction.Transpose(AliquotaArray)
    


    
    IE.Quit
End Sub

If you have any sugegstions to make better...please let me know :)
 
Upvote 0

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
Very well done! I'm glad you got it working.

One small change is that vbLf should be vbCrLf, to extract each line value in the table cell's innerText, without an apparent trailing space.

Here's another way of extracting the data from the inner table. I noticed that the outer table initially contains a "Loading..." string:

HTML:
< tr class="odd">< td class="dataTables_empty" valign="top" colspan="2">Loading...< /td>< /tr>< /tbody>

which disappears when the table has completely loaded:

HTML:
< tr class="odd" role="row">< td class="sorting_1">23-GEN-18< /td>< td class=" details-control">< /td>< /tr>< /tbody>

Once the outer table has completely loaded the cell containing the green "+" can be clicked, generating the inner table.

Code:
Public Sub IE_Extract_Table_Data()

    Dim IE As InternetExplorer
    Dim URL As String
    Dim HTMLdoc As HTMLDocument
    Dim table As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell
    Dim FasciaArray As Variant, AliquotaArray As Variant
    
    URL = "http://www1.finanze.gov.it/finanze2/dipartimentopolitichefiscali/fiscalitalocale/addregirpef/addregirpef.php?reg=17&anno=2018"
    
    Set IE = New InternetExplorer
    With IE
        .navigate URL
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        .Visible = True
        Set HTMLdoc = .document
    End With

    Set table = HTMLdoc.getElementById("addreg")
    While InStr(table.innerText, "Loading...") > 0
        DoEvents
    Wend
    table.Rows(1).Cells(1).Click
    
    Set table = table.getElementsByTagName("TABLE")(0)
    
    FasciaArray = Split(table.Rows(1).Cells(1).innerText, vbCrLf)
    AliquotaArray = Split(table.Rows(1).Cells(0).innerText, vbCrLf)
        
    With ActiveSheet
        .Cells.Clear
        .Range("A1").Resize(UBound(FasciaArray) + 1).Value = Application.WorksheetFunction.Transpose(FasciaArray)
        .Range("B1").Resize(UBound(AliquotaArray) + 1).Value = Application.WorksheetFunction.Transpose(AliquotaArray)
    End With
        
End Sub
 
Upvote 0
Thanks a lot John!
I've added your modification and also a method to composite the link to navigate...and used a form :)


Code:
Public regionsIT As Variant


       


Private Sub UserForm_Activate()
    'Dim regionsIT(20) As String
    Dim region As String
    Dim anno As Integer
    
    regionsIT = Array("Abruzzo_01", "Basilicata_02", "Calabria_04", "Campania_05", "Emilia-Romagna_06", "Friuli-Venezia Giulia_07", "Lazio_08", "Liguria_09", "Lombardia_10", "Marche_11", "Molise_12", "Piemonte_13", "Puglia_14", "Sardegna_15", "Sicilia_16", "Toscana_17", "Trentino-Alto Adige_18", "Valle d'Aosta_20", "Veneto_21")
    
    anno = 2015
    
    
    For i = 0 To UBound(regionsIT, 1) - LBound(regionsIT, 1) - 1
            region = Left(regionsIT(i), Len(regionsIT(i)) - 3)
            UserForm1.regionCombo.AddItem region
    Next
    
    For i = 2015 To Year(Date)
            UserForm1.yearCombo.AddItem i
    Next
End Sub


Public Sub IE_Extract_Table_Data()


    Dim IE As InternetExplorer
    Dim URL As String
    Dim HTMLdoc As HTMLDocument
    Dim table As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell
    Dim FasciaArray As Variant, AliquotaArray As Variant
    Dim regionCode As String


    regionCode = Right(regionsIT(UserForm1.regionCombo.ListIndex), 2)
    URL = "http://www1.finanze.gov.it/finanze2/dipartimentopolitichefiscali/fiscalitalocale/addregirpef/addregirpef.php?reg=" & regionCode & "&anno=" & yearCombo.Value
    
    Set IE = New InternetExplorer
    With IE
        .navigate URL
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        .Visible = True
        Set HTMLdoc = .document
    End With


    Set table = HTMLdoc.getElementById("addreg")
    While InStr(table.innerText, "Loading...") > 0
        DoEvents
    Wend
    table.Rows(1).Cells(1).Click
    
    Set table = table.getElementsByTagName("TABLE")(0)
    
    FasciaArray = Split(table.Rows(1).Cells(1).innerText, vbCrLf)
    AliquotaArray = Split(table.Rows(1).Cells(0).innerText, vbCrLf)
        
    With ActiveSheet
        .Cells.Clear
        .Range("A1").Resize(UBound(FasciaArray) + 1).Value = Application.WorksheetFunction.Transpose(FasciaArray)
        .Range("B1").Resize(UBound(AliquotaArray) + 1).Value = Application.WorksheetFunction.Transpose(AliquotaArray)
    End With
        
End Sub
 
Upvote 0
Adobe Acrobat API (only available in Acrobat Pro) can be called from VBA to extract text from a PDF file. It might be able to convert it directly to Excel - I don't know. The online tool might be able to be automated with VBA (upload PDF, convert & download, etc.), but this would be very complicated and time consuming to develop.
 
Upvote 0
Thanks John, and sorry for my late response.
Ok, i've read online that the solution is only availabe with Acrobat Pro...i will convert the pdf using online tools :(
Have a nice day :)
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,185
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top