How to extract table data from webpages with VBA which does not have column Headers

satish78

Board Regular
Joined
Aug 31, 2014
Messages
218
Hi Friends,

Trying to extract table data from multiple urls at once with Excel VBA.
Here are sample urls trying to extract from

ColumnA
[TABLE="width: 64"]
<tbody>[TR]
[TD="width: 64"]https://www.abbreviations.com/acronyms/NETWORKING/149[/TD]
[/TR]
[TR]
[TD]https://www.abbreviations.com/acronyms/NETWORKING/150[/TD]
[/TR]
[TR]
[TD]https://www.abbreviations.com/acronyms/NETWORKING/151[/TD]
[/TR]
[TR]
[TD]https://www.abbreviations.com/acronyms/NETWORKING/152[/TD]
[/TR]
[TR]
[TD]https://www.abbreviations.com/acronyms/NETWORKING/153

I want to extract abbreviation and full form into excel.
[/TD]
[/TR]
</tbody>[/TABLE]


Here is the code which is not working

<code style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: Consolas, Menlo, Monaco, "Lucida Console", "Liberation Mono", "DejaVu Sans Mono", "Bitstream Vera Sans Mono", "Courier New", monospace, sans-serif; vertical-align: baseline; box-sizing: inherit; white-space: inherit;">Option Explicit
Public Sub GetTable()
Dim sResponse As String, html As HTMLDocument, clipboard As Object
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse
= StrConv(.responseBody, vbUnicode)
End With

html
.body.innerHTML = sResponse

With html
ThisWorkbook
.Worksheets("Sheet1").Cells(1, 1) = .querySelector(".company_name").innerText
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard
.SetText .querySelector(".InputTable2").outerHTML
clipboard
.PutInClipboard
End With

ThisWorkbook
.Worksheets("Sheet1").Cells(2, 1).PasteSpecial

End Sub

</code>
 
Last edited:

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
i tried replacing this http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609",False url with my url [TABLE="class: cms_table, width: 64"]
[TR]
[TD]https://www.abbreviations.com/acronyms/NETWORKING/152[/TD]
[/TR]
[TR]
[/TR]
[/TABLE]
But, data is not getting extracted.
 
Upvote 0
Not able to point out. Could you help me?

I prefer read the html code and parse its "tabular" contents using for example this subroutine:
Code:
Sub GetWebTab2Param(lIE As Object, lUrl As String, ByRef myDest As Range)
'1: IE Session to use; 2: URL to be read; Destination
'
Dim myURL As String, myStart As Single, myItm As Object
Dim KK As Long, I As Long, J As Long, myColl, tRtR, tDtD
'
myURL = lUrl
'
On Error GoTo ExitA
With lIE
    .navigate myURL
'    .Visible = True
    Do While .Busy: DoEvents: Loop              'Wait not busy
    Do While .readyState <> 4: DoEvents: Loop   'Wait document ready
End With
'
myStart = Timer                                 '1 more sec
Do
    DoEvents
    If Timer > myStart + 1 Or Timer < myStart Then Exit Do
Loop

Set myColl = lIE.Document.getElementsbyTagName("TABLE")
For Each myItm In myColl
    myDest.Cells(I + 1, 1).Value = "TABLE#_" & KK: KK = KK + 1: I = I + 1
    For Each tRtR In myItm.Rows
        For Each tDtD In tRtR.Cells
            myDest.Cells(I + 1, J + 1) = tDtD.innertext
            J = J + 1
        Next tDtD
        I = I + 1: J = 0
    Next tRtR
I = I + 2
Next myItm
'
    Cells.WrapText = False
    Range("A1").Select
'
'Stop     'Vedi testo
'
ExitA:
End Sub
This need to be recalled by a master routine, that passes Url to be read and set the position where the returned data will be written.
Going to your case, the caller routine could be like this:
Code:
Sub IeTabReader()
Dim IE As Object, myRan As Range, I As Long
Set IE = CreateObject("InternetExplorer.Application")
Sheets("Sheet1").Select        '<<< The worksheet used for imported data
Cells.ClearContents            'WARNING: the sheet will be cleared at start 
Set myRan = Cells(Rows.Count, 1).End(xlUp).Offset(1, 0)
For I = 150 To 153             'This will read pages 150 to 153 of your site
    Set myRan = Cells(Rows.Count, 1).End(xlUp).Offset(1, 0)
    myRan.Select
    Call GetWebTab2Param(IE, "https://www.abbreviations.com/acronyms/NETWORKING/" & I, myRan)
Next I
Cells.WrapText = False
IE.Quit
Set IE = Nothing
End Sub
You need to run the caller macro, that is IeTabReader. The above code will read pages 150 through 153 of your networking acronyms
Of course you may change the calling sequence to read any site.
Examples:
Replace the For I = 150 To 153 / Next I loop with:

Code:
Call GetWebTab2Param(IE, "https://www.abbreviations.com/acronyms/NETWORKING/99999", myRan)
This will read all the networking acronims from the site

Code:
Call GetWebTab2Param(IE, " https://www.bbc.com/news/business/market-data", myRan)
This will read several tables from the bbc.com site

Bye
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,244
Members
452,622
Latest member
Laura_PinksBTHFT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top