Web scrape

Steve_R

Active Member
Joined
Oct 28, 2015
Messages
350
Hi

I've done a lot with VBA over the years but never used VBA to scrape data using IE or HTML.

I tried using a number of methods posted within the forum but every time I run into a problem I cannot solve. I suspect the following should do the trick but it breaks 'Object required'


Code:
Const sWebPage = "http://www.bom.gov.au/products/IDN60701/IDN60701.95929.shtml"




Sub Web_Table_Option_One()


Dim xml    As Object
Dim html   As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", sWebPage, False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result

'Breaks on the line below Error 424 Object required. I think my problem is in Item and Tag Names

Set objTable = IE.Document.all.Item("{What should this be}").getElementsByTagName("{What should this be}") (0)

' Can't test the following yet
 For lngTable = 0 To objTable.Length - 1
        For lngRow = 0 To objTable(lngTable).Rows.Length - 1
            For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
            Next lngCol
        Next lngRow
        ActRw = ActRw + objTable(lngTable).Rows.Length + 1
    Next lngTable

    Set xml = Nothing    
    Set html = Nothing
    Set objTable = Nothing
End Sub


The table wanted is t2 (identified from the following behind the webpage)

table id="t2" border="1" cellspacing="0" class="tabledata obs_table"


Small sample of required result (will insert headings later)

Excel 2010
ABCDEFGHIJKLMNO
1
201/11:30pm17.2CALM001011.3-0.4-Partly cloudy10----
301/11:00pm17.2CALM001011.4-0.4--10----
401/10:30pm17.6CALM001011.5-0.4--10----

<colgroup><col style="width: 25pxpx"><col><col><col><col><col><col><col><col><col><col><col><col><col><col><col></colgroup><thead>
</thead><tbody>
</tbody>
Sheet1
I am open to using Internet Explorer instead.
 

Excel Facts

Using Function Arguments with nested formulas
If writing INDEX in Func. Arguments, type MATCH(. Use the mouse to click inside MATCH in the formula bar. Dialog switches to MATCH.
Hi,
Try this:
Rich (BB code):
Sub Web_Table_Option_One()
 
  Const sWebPage = "http://www.bom.gov.au/products/IDN60701/IDN60701.95929.shtml"
 
  Dim a()
  Dim c As Long, cs As Long, r As Long, rs As Long
  Dim oDom As Object, oRow As Object, oCell As Object
 
  ' Copy HTML text of the web page to the oDom for farther parsing
  Set oDom = CreateObject("htmlfile")
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", sWebPage, False
    .Send
    oDom.body.innerHTML = .responseText
  End With
 
  ' Fill a() by HTML Table content
  With oDom.body.document.getElementById("t2")
    rs = .Rows.Length + 1
    ReDim a(1 To rs, 1 To 100)
    For Each oRow In .Rows
      r = r + 1
      For Each oCell In oRow.Cells
        c = c + 1
        a(r, c) = Replace(oCell.innerText, vbCr, vbNullString)
      Next
      If cs < c - 1 Then cs = c - 1
      c = 0
    Next
  End With
  ReDim Preserve a(1 To rs, 1 To cs)
 
  ' Write the extracted data to the Sheet1
  With Sheets("Sheet1").Range("A1")
    .CurrentRegion.ClearContents
    .Resize(rs, cs).Value = a()
    .Resize(rs, cs).EntireColumn.AutoFit
  End With
 
  ' Release the memory of the object variables
  Set oRow = Nothing
  Set oCell = Nothing
  Set oDom = Nothing
 
End Sub
 
Upvote 0
The same but with error trapping of web page request
and 2 rows skipping of the header with merged cells
Rich (BB code):
Sub Web_Table_Option_One()
 
  Const sWebPage = "http://www.bom.gov.au/products/IDN60701/IDN60701.95929.shtml"
 
  Dim a()
  Dim c As Long, cs As Long, r As Long, rs As Long, i As Long
  Dim oDom As Object, oRow As Object, oCell As Object
 
  ' Copy HTML text of the web page to the oDom for farther parsing
  With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", sWebPage, False
    .Send
    If .readyState = 4 And .Status = 200 Then
      Set oDom = CreateObject("htmlfile")
      oDom.body.innerHTML = .responseText
    Else
      MsgBox "Ready state: " & .readyState & vbLf & "HTTP request status: " & .Status, vbExclamation, "Error"
      Exit Sub
    End If
  End With
 
  ' Fill a() by HTML Table content
  With oDom.body.document.getElementById("t2")
    rs = .Rows.Length - 1
    ReDim a(1 To rs, 1 To 100)
    For Each oRow In .Rows
      i = i + 1
      If i > 2 Then
        r = r + 1
        For Each oCell In oRow.Cells
          c = c + 1
          a(r, c) = Replace(oCell.innerText, vbCr, vbNullString)
        Next
        If cs < c - 1 Then cs = c - 1
        c = 0
      End If
    Next
  End With
  ReDim Preserve a(1 To rs, 1 To cs)
 
  ' Write the extracted data to the Sheet1
  With Sheets("Sheet1").Range("A3").Resize(rs, cs)
    With .CurrentRegion
      If Rows.Count > 2 Then .Resize(.Rows.Count - 2).Offset(2).ClearContents
    End With
    .Value = a()
    .EntireColumn.AutoFit
  End With
 
  ' Release the memory of the object variables
  Set oRow = Nothing
  Set oCell = Nothing
  Set oDom = Nothing
 
End Sub
 
Upvote 0
Please replace this: If cs < c - 1 Then cs = c - 1
by that: If cs < c Then cs = c

The data rows looks like this:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
1
Date/Time
EDT​
Temp
°C​
Wind​
Pressure​
Rain since
9am
mm​
Weather​
Cloud​
Vis
km​
Sea
Height
m​
Swell​
2
Dir​
Spd
kts​
Gust
kts​
hPa​
Trend​
Height
m​
Period
s​
Dir​
3
01/11:30pm
17.2​
CALM
0​
0​
1011.3​
-
0.4​
-Partly cloudy
10​
----
4
01/11:00pm
17.2​
CALM
0​
0​
1011.4​
-
0.4​
--
10​
----
5
01/10:30pm
17.6​
CALM
0​
0​
1011.5​
-
0.4​
--
10​
----
6
01/10:00pm
17.6​
NNW
1​
4​
1011.8​
-
0.4​
--
10​
----
7
01/09:30pm
17.9​
SSW
3​
5​
1011.7​
-
0.4​
--
10​
----
8
01/09:00pm
18.2​
SW
5​
6​
1011.5​
-
0.4​
Recent precip.Partly cloudy
10​
----
9
01/08:30pm
18.5​
CALM
0​
0​
1011​
-
0.2​
-Partly cloudy
10​
----
10
01/08:00pm
18.9​
WNW
6​
8​
1011.6​
-
0​
-Partly cloudy
10​
----
11
01/07:30pm
18.9​
NNW
3​
6​
1011.2​
-
0​
-Partly cloudy
10​
----
12
01/07:00pm
19​
CALM
0​
0​
1011.5​
-
0​
-Partly cloudy
10​
----

<tbody>
</tbody>
Sheet: Sheet1

<tbody>
</tbody>
 
Last edited:
Upvote 0
Thank you.
You are welcome!
There is also "t3" table on that web page, append its content to the result if needed.
Good luck!
 
Last edited:
Upvote 0
Thank you. I had already tested "t1" and 't3" using your code and eventually plan to sit down and work out how to flexibly append the combination required.

The code seems significantly faster than my data query method that broke. The website was up but data query VBA was not grabbing the data. When tested in a separate workbook it failed at the line (.Refresh... ). The error is forgotten - "424 Object required", I think. When testing both your code and my data query code, data query broke and your code worked. Today data query works again. It occurred to me today that restart PC may have cured the problem but it's too late to test that now. Irrespective, it seems the WB may benefit from a change of method.
 
Upvote 0

Forum statistics

Threads
1,221,448
Messages
6,159,922
Members
451,604
Latest member
SWahl

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top