Hello Everyone!!
Hope you all are doing well.
Here i am again, to get some advice
I want to scrape a site that have this structure
And this is my excel vba code. Maybe it's a little chaotic, but that's the best I can do!
Ok. This is what i get
And this is what i need
I have two (big and wide) problems
1. I can't extract the league type correctly (column B)
2. I need to clean (on the fly) any other character around the players' names, but I do not know how to do .
I need a little help ( with some explanation ) to do this.
Thank you in advance
Hope you all are doing well.
Here i am again, to get some advice
I want to scrape a site that have this structure
HTML:
-<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<div id="games">
<div id="date-nav-wrapper">
<h1 id="date-nav-title">THIS IS <span>THE SITE</span></h1>
<div id="date-nav">
<div id="dn-left">
<a href="XXX"><span class="dn-left-arrow"></span></a>
</div>
<div id="dn-title">
1 Oct 2020 </div>
<div id="dn-right">
<a href="XXX"><span class="dn-right-arrow"></span></a>
</div>
</div>
</div>
<div>
<div class="league-header">
<span class="XXX"><img class="XXX" src="XXX" alt="XXX">
</span>
<span class="league-name"><span>Soccer League</span></span>
</div>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:30</span>
<span class="players">
<span class="home">BOB CLARENCE (Rou)</span>
<span class="score">3 - 1</span>
<span class="away"><span>KOHNING FRYTZZI(Fin)</span></span>
</span>
<span class="set">
<span class="1">6-5</span><span class="2">5-4</span><span class="3">8-9</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:45</span>
<span class="players">
<span class="home">JOHN B.(Fra)</span>
<span class="score">10 - 2</span>
<span class="away"><span>LILYTH (Lgb)</span></span>
</span>
<span class="set">
<span class="1">4-3</span><span class="2">2-1</span><span class="3">8-3</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">17:30</span>
<span class="players">
<span class="home">ERYK-VANHAN (Spa)</span>
<span class="score">CANC</span>
<span class="away"><span>KLEIN (Ger)</span></span>
</span>
<span class="set">
<span class="1">6-7</span><span class="2">1-3</span><span class="3">3-2</span>
</span>
</a>
<div class="league-header">
<span class="XXX"><img class="XXX" src="XXX" alt="XXX">
</span>
<span class="league-name"><span>Basket League</span></span>
</div>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:30</span>
<span class="players">
<span class="home">MAVERYCK</span>
<span class="score">3 - 3</span>
<span class="away"><span>BUENA DONNA</span></span>
</span>
<span class="set">
<span class="1">6-5</span><span class="2">5-4</span><span class="3">8-9</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:45</span>
<span class="players">
<span class="home">HOLLY MOLLY</span>
<span class="score">3 - 7</span>
<span class="away"><span>UAHAHA</span></span>
</span>
<span class="set">
<span class="1">4-3</span><span class="2">2-1</span><span class="3">8-3</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">17:30</span>
<span class="players">
<span class="home">***DANIEL</span>
<span class="score">8 - 5</span>
<span class="away"><span>ERIKA </span></span>
</span>
<span class="set">
<span class="1">6-7</span><span class="2">1-3</span><span class="3">3-2</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">17:30</span>
<span class="players">
<span class="home">IGLESIAS***</span>
<span class="score">8 - 5</span>
<span class="away"><span> MORICHKO</span></span>
</span>
<span class="set">
<span class="1">3-7</span><span class="2">3-3</span><span class="3">3-2</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">17:30</span>
<span class="players">
<span class="home">MARY</span>
<span class="score">postp</span>
<span class="away"><span>GLEN</span></span>
</span>
<span class="set">
<span class="1">3-7</span><span class="2">3-3</span><span class="3">3-2</span>
</span>
</a>
<div class="league-header">
<span class="XXX"><img class="XXX" src="XXX" alt="XXX">
</span>
<span class="league-name"><span>Hockey League</span></span>
</div>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:30</span>
<span class="players">
<span class="home">MARIANOVICH ALEX</span>
<span class="score">3 - 1</span>
<span class="away"><span>BOB BIG-BOY (Usa)</span></span>
</span>
<span class="set">
<span class="1">6-5</span><span class="2">5-4</span><span class="3">8-9</span>
</span>
</a>
<a class="game" href="I WANT THIS HREF">
<span class="time notime started">16:45</span>
<span class="players">
<span class="home">ANTHONY WELD </span>
<span class="score">10 - 2</span>
<span class="away"><span> MARYANN LOUIS**</span></span>
</span>
<span class="set">
<span class="1">4-3</span><span class="2">2-1</span><span class="3">8-3</span>
</span>
</a>
</div>
</body>
</html>
VBA Code:
Sub Getinfo()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim y As Long
Dim i As Long
Sheets("Sheet3").Select
Cells.Select
Selection.ClearContents
Range("A1").Select
With http
.Open "GET", "C:\SITE STRUCTURE.HTML", False
.send
html.body.innerHTML = .responseText
End With
Dim posts As MSHTML.IHTMLElementCollection
Dim post As MSHTML.IHTMLElement
Set posts = html.getElementsByClassName("game")
Dim homes As MSHTML.IHTMLElementCollection
Dim home As MSHTML.IHTMLElement
Set homes = html.getElementsByTagName("body")
Dim Leagues As MSHTML.IHTMLElementCollection
Dim League As MSHTML.IHTMLElement
Set Leagues = html.getElementsByClassName("league-name")
If Leagues.Length > 0 Then
For y = 0 To Leagues.Length - 1
i = 3
For Each home In homes
'For Each league In leagues
For Each post In posts
'If league.className = "league-name" Then
Sheet3.Range("A" & i).Value = home.getElementsByTagName("div")(4).innerText ' Works fine in this case but i'm not sure if is the right mode to do this
Sheet3.Range("B" & i).Value = Leagues(0).innerText
Sheet3.Range("C" & i).Value = post.Children(0).innerText
'Debug.Print post.innerText
Sheet3.Range("D" & i).Value = post.Children(1).Children(0).innerText
Sheet3.Range("E" & i).Value = post.Children(1).Children(1).innerText
Sheet3.Range("F" & i).Value = post.Children(1).Children(2).innerText
Sheet3.Range("G" & i).Value = post.Children(2).Children(0).innerText
Sheet3.Range("H" & i).Value = post.Children(2).Children(1).innerText
Sheet3.Range("I" & i).Value = post.Children(2).Children(2).innerText
Sheet3.Range("J" & i).Value = post.href
i = i + 1
Next
Next
Next
End If
End Sub
Ok. This is what i get
And this is what i need
I have two (big and wide) problems
1. I can't extract the league type correctly (column B)
2. I need to clean (on the fly) any other character around the players' names, but I do not know how to do .
I need a little help ( with some explanation ) to do this.
Thank you in advance