Web Scraping multiple pages VBA

Jacob45678

New Member
Joined
Sep 17, 2022
Messages
10
Office Version
  1. 2019
Platform
  1. Windows
Hi, I want to web scrape information to an excel worksheet for multiple pages. example from page 1 to 3. I tried a code found online from googling but it sends out an error. Any help is much appreciated. Thanks in advance. :)

I would like to sort the information like the example shown below.
1664880479445.png






The Website

The Code (VBA)

Sub TableScrapeFromURL()
'
Dim ArrayColumn As Long, ArrayRow As Long
Dim TableCell As Object, TableRow As Object
Dim Doc As Object, http As Object
Dim ScrapedTableArray As Variant
'
Set Doc = CreateObject("htmlFile") ' Late bind Microsoft HTML Object Library ... This can
' ' be late bound as long as 'class' is not needed
Set http = CreateObject("msxml2.xmlhttp") ' Late bind Microsoft XML
'
ArrayColumn = 1: ArrayRow = 1 ' Initialize ArrayColumn & ArrayRow to 1
'
With http '
.Open "GET", "Steam Community Market :: Showing all listings", False ' Load website to load
.Send ' Send website to load
Doc.body.innerHtml = .responseText ' save all data received from website
End With
'
With Doc.getelementsbytagname("table")(0) '
ReDim ScrapedTableArray(1 To .Rows.Length, 1 To .Rows(1).Cells.Length) ' Resize ScrapedTableArray
'
For Each TableRow In .Rows ' Loop through each TableRow
For Each TableCell In TableRow.Cells ' Loop through each TableCell in TableRow
ScrapedTableArray(ArrayRow, ArrayColumn) = TableCell.innerText ' Save value into ScrapedTableArray
ArrayColumn = ArrayColumn + 1 ' Increment ArrayColumn
Next ' Loop back
'
ArrayColumn = 1: ArrayRow = ArrayRow + 1 ' Reset ArrayColumn to 1, Increment ArrayRow
Next ' Loop back
End With
'
Range("A1").Resize(UBound(ScrapedTableArray, 1), _
UBound(ScrapedTableArray, 2)) = ScrapedTableArray ' Display ScrapedTableArray to sheet
End Sub
 

Excel Facts

Select all contiguous cells
Pressing Ctrl+* (asterisk) will select the "current region" - all contiguous cells in all directions.
Hi there.

I did my standard test of website resistance in python. I tried to send 400 requests in a short time, but what's come out on 24 request server response
with error <Response [429]>. It meant too many requests and that was tested with randomized headers - 'User-Agent. ( I try to mask that I am a robot, tries to pretend to be a browser)


{'User-Agent': 'Mozilla/5.0 (Linux; Android 12; SM-S906N Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/80.0.3987.119 Mobile Safari/537.36'}

It means you should try inputting some wait function to slow down your Makro [ I do not know how I do not know VBA only Python, but I am here to learn ;) ].
This website look's like quite a good protected against scraping, probably heavy tools like selenium will be necessary [ automated web browser ].

It is common in very popular websites that they are protected but if they are popular enough probably there are already some pre-build tools to scrape them, If I were you I'd check RapidAPI to see if they have anything reasonably priced.

429 error.png
 

Attachments

  • 429 error.png
    429 error.png
    90.6 KB · Views: 15
Upvote 0

Forum statistics

Threads
1,223,214
Messages
6,170,771
Members
452,353
Latest member
strainu

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top