Read nested table inside an html without class/id

jonsey

New Member
Joined
Jun 1, 2018
Messages
11
Hi guys,
i've got a problem with a scraping of a table from a website. Unfortunally i don't have an id to locate in a second the table in the web page, but this is located in another table that i can identify. I can post the html code here below :


Code:
< table id="addreg" class="display dataTable" cellspacing="0" style="width: 98%;" summary="Dettaglio procedimenti" role="grid" aria-describedby="addreg_info">
    < thead>
        < tr role="row">
            < th class="sorting_desc" tabindex="0" aria-controls="addreg" rowspan="1" colspan="1" aria-label=" Data pubblicazione : activate to sort column ascending" aria-sort="descending" style="width: 127px;"> Data pubblicazione < /th>
            < th class="details-control sorting_disabled" rowspan="1" colspan="1" aria-label=" Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)" style="width: 628px;"> Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)< /th>
        < /tr>
    < /thead>
    < tfoot>
        < tr>
            < th rowspan="1" colspan="1"> Data pubblicazione < /th>
            < th class="details-control" rowspan="1" colspan="1"> Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)< /th>
        < /tr>
    < /tfoot>


    < tbody>


        < tr role="row" class="odd shown">
            < td class="sorting_1">23-GEN-18< /td>
            < td class=" details-control">< /td>
        < /tr>
        < tr>
            < td colspan="2">
                < table cellpadding="5" cellspacing="0" border="0" width="98%">
                    < tbody>
                        < tr style="width:50px;">
                            < th style="text-align:right;">Aliquota< /th>
                            < th>Fascia di applicazione< /th>
                        < /tr>
                        < tr>
                            < td align="right">1.42
                                < hr>1.43
                                < hr>1.68
                                < hr>1.72
                                < hr>1.73< /td>
                            < td>fino a 15000.00 euro
                                < hr>oltre 15000.00 e fino a 28000.00 euro
                                < hr>oltre 28000.00 e fino a 55000.00 euro
                                < hr>oltre 55000.00 e fino a 75000.00 euro
                                < hr>oltre 75000.00 euro< /td>
                        < /tr>
                    < /tbody>
                < /table>
                < table cellpadding="5" cellspacing="0" border="0" width="98% style=" padding-left:100px; "=" ">< tbody>< tr>< th>Disposizioni particolari< /th>< /tr>< tr>< td> < /td>< /tr>< tr>< th>Norme di riferimento< /th>< /tr>< tr>< td>ART.4 L.R. 77/2012< /td>< /tr>< tr>< th>Note< /th>< /tr>< tr>< td> < /td>< /tr>< /tbody>< /table>< /td>< /tr>< /tbody>
        < /table>

What i need to read are the value in the nested table inside the main one, the table where are placed the value : 1.42,1.43,1.68,1.72,1.73.
I can read the table with id "addreg", but cant "navigate" in the nested table.
I've read the first table with the following code :


Code:
Sub test()
    
    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLdoc As MSHTML.HTMLDocument
    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLTables As MSHTML.IHTMLElementCollection
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    
    IE.Visible = False
    IE.navigate "http://www1.finanze.gov.it/finanze2/dipartimentopolitichefiscali/fiscalitalocale/addregirpef/addregirpef.php?reg=17&anno=2018"
    
    'Do While IE.Busy = True Or IE.readyState <> 4: DoEvents: Loop
'    Do While IE.readyState <> READYSTATE_COMPLETE
'    Loop


    Application.Wait (Now + TimeValue("0:00:2"))
    
    Set HTMLdoc = IE.document
    Set HTMLTables = HTMLdoc.getElementsByTagName("table")
    
    'Debug.Print HTMLTables.Length
    
    For Each HTMLTable In HTMLTables
            'Debug.Print HTMLTable.className


            For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
                'Debug.Print vbTab & HTMLRow.innerText


                    For Each HTMLCell In HTMLRow.getElementsByTagName("td")
                        Debug.Print vbTab & HTMLCell.innerText


                    Next HTMLCell


            Next HTMLRow
    Next HTMLTable


   'Debug.Print HTMLTables(0).getElementsByTagName("tr").innerText
    
    IE.Quit
End Sub


Can someone help me with this problem?There is a procedure to parse a table from html also if i don't have any id/class to identify?
Is possible to navigate the chield of the table and read the table inside a child?
Thanks a lot in advance!!!
 

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Try this:
Code:
    Dim table As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell
    Set table = HTMLdoc.getElementById("addreg")
    Set table = table.getElementsByTagName("TABLE")(0)
    For Each tRow In table.Rows
        For Each tCell In tRow.Cells
            Range("A1").Offset(tRow.RowIndex, tCell.cellIndex).Value = tCell.innerText
        Next
    Next
It looks like the numbers 1.42,1.43, etc., are in the same cell, so depending on how you want to extract the data you might need to Split(tCell.innerText,vbLf) them into an array and loop through or access each array element to get each individual number. The same applies to the 'fino a 15000.00 euro', etc. text.
 
Upvote 0
Hi John, thanks for your reply. I've tested your code, but it doesn't work. Why you set two times "table"

Set table = HTMLdoc.getElementById("addreg")
Set table = table.getElementsByTagName("TABLE")(0)
 
Upvote 0
Hi John, thanks for your reply. I've tested your code, but it doesn't work. Why you set two times "table"

Set table = HTMLdoc.getElementById("addreg")
Set table = table.getElementsByTagName("TABLE")(0)

Executing the code it copy the first two rows of the external table :/
 
Upvote 0
I just decided to re-use the same variable, since you're only interested in the inner table. The code gets the table with id="addreg" and then the first table within that. It could be a timing issue - set a breakpoint on the first Set table and then continue the code.
 
Upvote 0
Hi John, if i print what is inside the ... this is what i got
Data pubblicazione
Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)




Data pubblicazione
Dati di dettaglio (Aliquote/Fasce applicazione/Disposizioni particolari/Norme di riferimento/Note)








23-GEN-18

I cannot see the inner table unfortunally... :confused:
 
Upvote 0
I replied to your PM, but your Inbox is full - new members have to enable the Inbox quota, I think.

I've no idea what you're doing wrong. Your OP shows 2 tables, one nested inside the other. There could be some interaction (moving the mouse, clicking something, maybe) which causes the nested table to be generated dynamically.

But please stick to this thread if you have more questions.
 
Last edited:
Upvote 0
John sorry, im new in this forum and maybe i don't have permission to receive more than 1 message in the inbox...
That table is "generated" once you click on the (+) button on the right side of the table : http://www1.finanze.gov.it/finanze2.../addregirpef/addregirpef.php?reg=17&anno=2018
You are right, if i use inspect on chrome, in that page isn't present any nested table...only once is pressed that button is available...fck :/
 
Upvote 0
Man...i'm quite near to the solution...

Set HTMLdoc = IE.document


Dim table As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell
Set table = HTMLdoc.getElementById("addreg")
'Debug.Print HTMLdoc.getElementById("addreg").innerText
'Debug.Print HTMLdoc.getElementById("addreg").innerHTML
Set table = HTMLdoc.getElementsByTagName("TABLE")(0)
Debug.Print table.innerHTML
For Each tRow In table.Rows
For Each tCell In tRow.Cells
tCell.Click
Next
Next

i've addedd tCell.Click to expand and generate the table, now redoing the same two nested for i can print all the cell.
Now i think i've to parse the output and take what i really need.
I let you know when i've done :)
Thanks for now!
 
Upvote 0
This is the output of the only second inner table :

<tbody>
<tr style="width: 50px;">
<th style="text-align: right;">Aliquota</th>
<th>Fascia di applicazione</th>
</tr>
<tr>
<td align="right">1.42
<hr>1.43
<hr>1.68
<hr>1.72
<hr>1.73
</td>
<td>fino a 15000.00 euro
<hr>oltre 15000.00 e fino a 28000.00 euro
<hr>oltre 28000.00 e fino a 55000.00 euro
<hr>oltre 55000.00 e fino a 75000.00 euro
<hr>oltre 75000.00 euro
</td>
</tr>
</tbody>
 
Upvote 0

Forum statistics

Threads
1,223,997
Messages
6,175,871
Members
452,679
Latest member
darryl47nopra

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top