Web scraping slow using chrome driver

SQUIDD

Well-known Member
Joined
Jan 2, 2009
Messages
2,126
Office Version
  1. 2019
  2. 2016
Platform
  1. Windows
Hi

So my current code is very slow. I know the reason it is slow. But i just dont know how to go about changing it up.

I believe it is because i am parsing the browser for each and every element.

Is there a way i can copy the table and output it in a sheet quicker?

Thanks
Dave


VBA Code:
Sub GET_DOGS()
Set mybrowser = New Selenium.ChromeDriver
mybrowser.AddArgument "--headless"
mybrowser.Get "https://greyhoundbet.racingpost.com/#search-dog/dog_id=556694"
ROW_COUNT = mybrowser.FindElementById("sortableTable").FindElementsByTag("tr").Count
For b = 2 To ROW_COUNT
    For a = 1 To 16
        Cells(b, a) = mybrowser.FindElementByXPath("/html/body/div[3]/div[2]/div[2]/div[2]/div[2]/div[1]/div/div/table[2]/tbody/tr[" & b & "]/td[" & a & "]").Text
    Next a
Next b
mybrowser.Quit
End Sub
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
You might try using Power Query. Look at my signature for more info on PQ
 
Upvote 0
Hi.

Thanks for your suggestion.

I have looked at power query before but it does not seem to have any data I can use.
 
Upvote 0
I have managed a workaround for anyone else that may find this useful, code below.

utilising the clipboard

you have to set references to MS forms object library also.

The copy is near immidiate :)


VBA Code:
Sub GET_DOGS()
Application.ScreenUpdating = False
Dim clipb As MSForms.DataObject
Set clipb = New MSForms.DataObject
Set mybrowser = New Selenium.ChromeDriver
    mybrowser.AddArgument "--headless"
    mybrowser.Get "https://greyhoundbet.racingpost.com/#search-dog/dog_id=" & Sheets("RACING IDS").Range("B5")
    a = mybrowser.FindElementById("sortableTable").Attribute("outerHTML")
    clipb.SetText a
    clipb.PutInClipboard
    Range("A1").Select
    ActiveSheet.PasteSpecial Format:="Unicode Text", link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True
    Application.ScreenUpdating = True
End Sub
 
Upvote 0
Solution
I quite like solution you arrived at where you've effectively copied and pasted HTML code (importantly, with the HTML Formatting!). I will try that next, thank you.
But aren't you comparing apples and oranges? In addition to using the MS Forms Object Library, you've also used a very different method of getting the data. Originally you used mybrowser.FindElementByXPath which goes column by column, and row by row - I can believe would be quite time consuming!
 
Upvote 0
Hi.

Thanks for your suggestion.

I have looked at power query before but it does not seem to have any data I can use.
Sorry, one more thing - I tried @alansidman proposed PQ solution and it worked perfectly for me - I don't it didn't for you. It came out with the same data as when I ran your code.
 
Upvote 0
Hi Dan

Thanks, i really am not familiar with power query, and since i couldnet get any dat i assumed perhaps it wasnt compatible with that website.

however, noy say you have, i think i need to explore this further.

Yes, origianlly i was using xpath, but that just took so long, thats why i opted to copy the html.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,183
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top