Web Scraping from list of locations

DEMI224

New Member
Joined
Apr 24, 2012
Messages
12
Let's start by saying that I've never done webscraping. It seems like it should be a fairly easy process, but I need to pull data from multiple websites. The url for my first location is http://waterdata.usgs.gov/nwis/dv?c...07&site_no=371435093134701&referred_module=sw

...except I need to automate changing the site number. I have an Excel file that lists all of the site numbers in a single column, so it seems like I should be able to cycle through those, but I've tried a few different types of loops and can't get anything to work. Here's my most recent (and kind of sad attempt):

Sub web_scrape()
Dim wb As Workbook, fil As String, StaID As String

Worksheets("USGS_Observation_Wells").Activate

For x = ActiveSheet.QueryTables.Count To 1 Step -1
ActiveSheet.QueryTables(x).Delete
Next x

For i = 2 To 456
StaID = Worksheets("USGS_Obersvation_Wells").Range(i, 19)
With ActiveSheet.QueryTables.Add(Connection:="URL;http://waterdata.usgs.gov/nwis/dv?cb_72019=on&format=rdb&begin_date=1949-01-01&end_date=2012-05-07&site_no=" + StaID + "&referred_module=sw", Destination:=Worksheets("USGS_Observation_Wells").Range("U1"))
.Refresh BackgroundQuery:=False
End With
Next i

End Sub


Not only does it not work, but it pastes to the same location in the spreadsheet every time. So part 2 of my question, how can I make it past to the first blank column?
 

Excel Facts

How to show all formulas in Excel?
Press Ctrl+` to show all formulas. Press it again to toggle back to numbers. The grave accent is often under the tilde on US keyboards.
Here's a different route I tried taking - it also errors out.

Sub WebScraping()
Dim oIE As Object
Dim sURL As String
Dim sStaID As String
Dim vTxtInput As Variant
Dim ieDoc As Object
Dim IeTable As Object
Dim ieCell As Object
Dim i As Integer, x As Integer, k As Integer, p As Integer
For i = 2 To 456
sStaID = Worksheets("USGS_Observation_Wells").Range(i, 19)
Set oIE = CreateObject("InternetExplorer.Application")
sURL = "http://waterdata.usgs.gov/nwis/dv?c...n_date=1949-01-01&end_date=2012-05-07&site_no=" + StaID + "&referred_module=sw"
With oIE
.Navigate sURL
.Visible = True
Do While .Busy

Loop
Set vTxtInput = .document.getElementsByName("s")
vTxtInput(0).Value = sStaID
Do While .Busy
Loop
End With
Set ieDoc = oIE.document
For k = 0 To ieDoc.all.Length
Set IeTable = ieDoc.all(k)
For x = 1 To IeTable.Rows.Length
For p = 0 To IeTable.Rows(x).Cells.Length
Set ieCell = IeTable.Rows(x).Cells(p)
ActiveCell.Offset(x, p).Value = ieCell.innerText
Next
Next
Exit For
Next k
oIE.Quit
Set oIE = Nothing
Next i
End Sub


If someone else could please help me test this, here's a short subset of what some of the StaID's are:

<TABLE style="WIDTH: 77pt; BORDER-COLLAPSE: collapse" border=0 cellSpacing=0 cellPadding=0 width=102><COLGROUP><COL style="WIDTH: 77pt; mso-width-source: userset; mso-width-alt: 3730" width=102><TBODY><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; WIDTH: 77pt; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20 width=102>384713091474301</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>384832093192501</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>384917091594401</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>385156092263202</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>385432091343201</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>385718092234201</TD></TR><TR style="HEIGHT: 15pt" height=20><TD style="BORDER-BOTTOM: #f0f0f0; BORDER-LEFT: #f0f0f0; BACKGROUND-COLOR: transparent; HEIGHT: 15pt; BORDER-TOP: #f0f0f0; BORDER-RIGHT: #f0f0f0" class=xl63 height=20>385853092592801</TD></TR></TBODY></TABLE>
 
Upvote 0

Forum statistics

Threads
1,223,903
Messages
6,175,286
Members
452,631
Latest member
a_potato

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top