Web Scraping with Excel VBA

ToyoMike

New Member
Joined
Jul 5, 2020
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hello,

I have some script below in which I am trying to extract the information listed in the red box into an Excel file using VBA. Ideally, I would really love to get everything under the <div class="kioskDockDoor blueBg" line but I'll take any help I can get. I've tried multiple variations of code with no luck. I should mention that this is my first time trying to "scrape" a website. I am also using Chrome if that helps. Below is the only code that I could find that doesn't give any errors but doesn't give a result either.

Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object

With http
.Open "GET", "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=84", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName(" col-md-6 col-xs-6 rights ")
x = x + 1: Cells(x, 1) = source.getAttribute("kioskLabel")
Cells(x, 2) = source.getAttribute("trailerId")
Next source
End Sub


HTML Class.PNG

Thank you for any help or direction you can provide,

Mike
 

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney
Hi Mike,
I'd start with switching on the Direct window in the VBA-editor, so you can dump some results there with Debug.Print. Next: run the code with F8, going step by step, instead of pressing F5 in a "work or die" attempt. So something like this can help you determine what's going wrong:
VBA Code:
Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object

http.Open "GET", "[URL]https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=84[/URL]", False
http.send
html.body.innerHTML = http.responseText

debug.print html.body.innerHTML
'That is check 1, should show some response

For Each source In html.getElementsByClassName("kioskDockDoor blueBg")
   debug.print source.innerHTML
   'etc...

Next source

End Sub
Hope that helps,
Koen
 
Upvote 0
Thank you for your reply Koen. I really appreciate it!
After poking around on the net I found and rewrote some code that is getting some results. I know it's rather crude code but I'm just learning this stuff. Here's what I have so far:

Code:
Sub GetData()
 
    Dim objIE As InternetExplorer
    Dim itemEle As Object
 
    Set objIE = New InternetExplorer
    objIE.Visible = True
 
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=76"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear
    For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
    
        Scac = itemEle.getElementsByClassName("col-md-3 col-xs-3 lefts")(0).getElementsByTagName("div")(1).textContent
        Door = itemEle.getElementsByClassName("col-md-6 col-xs-6 centers")(0).getElementsByTagName("div")(1).textContent
        LoadType = itemEle.getElementsByClassName("col-md-3 col-xs-3 rights")(0).getElementsByTagName("div")(1).textContent
        RouteRun = itemEle.getElementsByClassName("col-md-6 col-xs-6")(0).getElementsByTagName("div")(1).textContent
        Equipment = itemEle.getElementsByClassName("col-md-6 col-xs-6 rights")(0).getElementsByTagName("div")(1).textContent
        Spot = itemEle.getElementsByClassName("col-md-12 col-xs-12 centers")(0).getElementsByTagName("div")(0).textContent
        
        Sheets("Sheet1").Range("B2").Value = Scac
        Sheets("Sheet1").Range("C2").Value = Door
        Sheets("Sheet1").Range("D2").Value = LoadType
        Sheets("Sheet1").Range("E2").Value = RouteRun
        Sheets("Sheet1").Range("F2").Value = Equipment
        Sheets("Sheet1").Range("G2").Value = Spot

    Next
    
err_clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
objIE.Quit
 
End Sub

And here's a more detailed view of what I am trying to retrieve:

YMS Code.PNG


I am trying to retrieve each of the values for "kioskLabel" and the value under it. Also, the same information resides under "dockDoor_59" when expanded and I would like to scrape that as well. Any help would be greatly appreciated. Thank you!
 
Upvote 0
Hi Mike,
that looks like good progress/learning going on :). There are a couple of ways you can achieve that, the main commands you'd need are Sibling & Children. HTML can be seen as a structure, where a level below the item are its children and the item at the same level next to it as a sibling (and up: parent). E.g. for each itemEle loop through all items named kioskLabel and take (for the value) the NextSibling for the value. There are probably different routes, but that would lead to something like this (untested):

VBA Code:
   'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 2
   Set Sht = Sheets("Sheet1")

   For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 1
        ResCol = 1
        For Each d in itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.InnerHTML
             Sht.Cells(ResRw, ResCol + 1).Value = d.NextSibling.InnerHTML
             'For the next result, add 2 to the column
             ResCol = ResCol + 2
        Next d 
        'For the next result, add 1 to the row
        ResRw = ResRw + 1
    Next
 
Upvote 0
That makes a lot of sense Koen. I can see how the "d.InnerHTML" and "d.NextSibling.InnerHTML" are working in your example but here's what I've come accross... First I had to change "Next Sibling" to "NextElementSibling" as I wasn't getting results. Is that okay? Secondly, I get results now but I'm only getting the first 3 occurrences of "kioskLabel" (sibling) out of five. Plus, the 4th "kioskLabel" is coming up as "Shipment" and not "Route Run" as shown above. Should I be using a Child argument instead of Sibling or is it because there is a space between "Route Run" and it's not picking that up?

Capture.PNG
 
Upvote 0
Hi Mike,
that's a tough one as I don't have your source & solution. Question 1: yes, if you get results, that is normally a good sign, sometimes you need different commands/properties than you originally thought, but if it gets the job done it's okay in my book :-). For question 2: did you try the debug.print & F8 solutions that I posted? That should give you a pretty clear idea of what is going on and what bits of html are processed by your code.
Feel free to post your efforts & source (not as an image please, that doesn't help making a test case) so far, I'll try to have a go with it.
Cheers,
Koen
 
Upvote 0
Hi Mike,
that's a tough one as I don't have your source & solution. Question 1: yes, if you get results, that is normally a good sign, sometimes you need different commands/properties than you originally thought, but if it gets the job done it's okay in my book :). For question 2: did you try the debug.print & F8 solutions that I posted? That should give you a pretty clear idea of what is going on and what bits of html are processed by your code.
Feel free to post your efforts & source (not as an image please, that doesn't help making a test case) so far, I'll try to have a go with it.
Cheers,
Koen
I was trying to utilize debug.print but couldn't figure out how to use it properly to get any results. I'll work on it some more at work and see if I can get the debug.print to work. Thank you again Koen. I really appreciate your time! Gonna have to buy you a coffee or beer ?
Cheers mate!
Mike
 
Upvote 0
Koen,
It seems like what I'm trying to achieve would be pretty simple but I'm having a really hard time getting there. The code that you helped with seems like it would do the trick but it's not producing all of the results and I'm not familiar with the debug.print enough to figure out why. Can you please take a look and point me in the right direction? Any help would be greatly appreciated.
- Mike

Left side of the picture is excel while the right side is a snip of the website.
YMS Snip.PNG

Code:
Sub Scrape_YMS()

'User: 020885
'Pass: Ilovealisha@22

Dim objIE As InternetExplorer
    Dim itemEle As Object
 
    Set objIE = New InternetExplorer
    objIE.Visible = True
 
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=76"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 1
   Set Sht = Sheets("Sheet1")

For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 2
        For Each d In itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=75"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 5
   Set Sht = Sheets("Sheet1")

For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 2
        For Each d In itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=84"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 11
   Set Sht = Sheets("Sheet1")

For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 2
        For Each d In itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=85"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 13
   Set Sht = Sheets("Sheet1")

For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 2
        For Each d In itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
    objIE.navigate "https://whelo.4sightsolution.net/dockKiosk/dockKioskList?dockGroupId=108"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
 
    On Error GoTo err_clear

'For the results, set the start row and Set the sheet so you don't have to repeat the name
   ResRw = 15
   Set Sht = Sheets("Sheet1")

For Each itemEle In objIE.document.getElementsByClassName("kioskDockDoor")
         'Every kioskDockDoor goes on a new line, so reset the column to 2
        ResCol = 2
        For Each d In itemEle.getElementsByClassName("kioskLabel")
             'First write the kioskLabel, after that the next Sibling (which should be the trailerId)
             Sht.Cells(ResRw, ResCol).Value = d.innerHTML
             Sht.Cells(ResRw + 1, ResCol).Value = d.nextElementSibling.innerHTML
             'For the next result, add 1 to the column
             ResCol = ResCol + 1
        Next d
        'For the next result, add 2 to the row
        ResRw = ResRw + 2
    Next
    
err_clear:
  If Err <> 0 Then
    Err.Clear
  Resume Next
End If

objIE.Quit

End Sub
 
Upvote 0
Could you paste some of your source? I see an image, but as you can imagine I am very reluctant to type that all myself. Thanks!
 
Upvote 0
The source is the same as the post from July 12th. I'm getting the following values from the "kioskLabel" and it's siblings:
"Scac"/"MCKK", "Dock"/"D-03", "Load Type"/"Drop"... The first 3 work great but then the 4th div element comes over as "Shipment" instead of "Route Run" with no sibling and the 5th comes over as "Equipment #" with no sibling, and 6th div elements don't come over at all. I understand the 6th because it is not under the "kioskLabel" class name but I can't figure out the others. In the code above, I've added the user/pass to access the website so that you can see what it looks like.
 
Upvote 0

Forum statistics

Threads
1,224,926
Messages
6,181,793
Members
453,067
Latest member
cernytomas

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top