I am trying to create a macro that pulls (and will update) property auction data from auction.com. I am new to data scraping using vba and for each auction, I need to pull the Property Address (and maintain the hyperlink to the href address in Excel, ideally), Item #, Starting Bid, and auction Start Date. My problem is that I am unsure how to call data from a specific <li> nested in a <ul> but every time I try, it pulls all entries for the property that share an <li> class name. Below is an example of the HTML:
Here is the code that I have written so far; the Property Address (without hyperlink though), Item #, and Starting Bid all pull correctly.
Does anyone know a more efficient way to scrape this data? I keep seeing answers involving jQuery, which I have no experience with. I would really appreciate some guidance. Thanks!
HTML:
<div class="contentDetail searchResult" property-id="193014410"> <div class="layoutSingle left ultra-wide"> <div class="searchResultGalleryData"> <p>Item #:</p> <p class="highlightedData">B133-101</p> <p>Arlington, TX</p> </div> <div class="searchResultImageBox"> <div class="searchResultImage"> <div class="secondaryAlertBanner hidden"></div> <a class="searchResultImageLink searchTrackPDP" href="http://auction.com/Texas/commercial-auction-asset/193014410-7786-2001-and-2005-NE-Green-Oaks-Blvd-Arlington-TX-76006-B133" property-id="193014410" row-index="1"><img class="searchResultImage" src="http://cdn.mlhdocs.com/rcp_files/auctions/B-133/photos/thumbnails/ForestHill-1_bigThumb.jpg" alt=""></a> </div> <div class="primaryAlertBanner listOnly red">Broker co-op available</div> </div> <ul class="searchResultInfo major"><li class="searchResultAddress"><a class="searchTrackPDP" href="http://auction.com/Texas/commercial-auction-asset/193014410-7786-2001-and-2005-NE-Green-Oaks-Blvd-Arlington-TX-76006-B133" row-index="1">2001 And 2005 Ne Green Oaks Blvd<br>Arlington, TX 76006</a></li><li> <span class="searchResultLabel">Asset Type:</span> Commercial </li><li> <span class="searchResultLabel">Property Type:</span> Office </li> <li class="highlight"> <span class="searchResultLabel">Item #:</span> B133-101 </li> </ul> <div class="clear"></div> <div class="primaryAlertBanner galleryOnly red">Broker co-op available</div> </div> <div class="layoutSingle right"> <ul class="searchResultInfo minor"> </ul> <ul class="searchResultInfo major"><li class="highlightLarge"> <span class="searchResultLabel">Starting Bid:</span> $1,250,000 </li> <li class="highlight"> <span class="searchResultLabel">Start Date:</span> <span>07/14/2014</span> </li> <li class="highlight"> <span class="searchResultLabel">Auction Type:</span> Online </li> </ul> </div> <div class="clear"></div> <div class="buttonRow right"> <a class="button small transparent searchPropertySaved" property-id="193014410">Property Saved</a> <a class="button primary small searchSaveProperty" property-id="193014410">Save Now</a> <a class="button blue small searchTrackPDP" href="http://auction.com/Texas/commercial-auction-asset/193014410-7786-2001-and-2005-NE-Green-Oaks-Blvd-Arlington-TX-76006-B133" row-index="1">More Details</a> </div> <div class="galleryDataSection"> <div class="dataItem"> Starting Bid: <span class="highlightedData dataValue">$1,250,000</span> </div> </div> <div class="clear"></div> ******** type="text/html" id="tooltip-193014410"> <div class="propertyHover"> <div class="header"> Item # <span class="itemNo"> B133-101 </span> | <span class="address"> 2001 And 2005 Ne Green Oaks Blvd Arlington, TX 76006 </span> </div> <ul class="propertyFields"> <li><span class="label">County</span><span class="value">Tarrant</span></li> <li><span class="label">Property Type</span><span class="value">Office</span></li> <li><span class="label">Bedrooms</span><span class="value"></span></li> <li><span class="label">Bathrooms</span><span class="value"></span></li> <li><span class="label">Occupancy Status</span><span class="value"></span></li> <li><span class="label">Lot Size (acres)</span><span class="value"></span></li> <li><span class="label">Previously Valued To</span><span class="value">N/A</span></li> </ul> <div class="clear"></div> <div class="propertyImageBox"> <img class="propertyImage" src="http://cdn.mlhdocs.com/rcp_files/auctions/B-133/photos/thumbnails/ForestHill-1_bigThumb.jpg" /> </div> </div> *********></div>
Here is the code that I have written so far; the Property Address (without hyperlink though), Item #, and Starting Bid all pull correctly.
Code:
Sub Commercial_Filter()
ScreenUpdating = False
Set IE = CreateObject("InternetExplorer.Application")
Dim ele As Object
'Dim links As Variant, lnk As Variant
Set sht = Sheets("Daily Log")
RowCount = 1
sht.Range("A" & RowCount) = "Property"
sht.Range("B" & RowCount) = "Item #:"
sht.Range("C" & RowCount) = "Start Date"
sht.Range("D" & RowCount) = "Property Type"
sht.Range("E" & RowCount) = "Sq Feet"
sht.Range("F" & RowCount) = "Start Date"
sht.Range("G" & RowCount) = "Starting Bid"
'<-----------Jump to Commercial Search Results for Retail & Office----------->
With IE
.Visible = False
.Navigate "http://www.auction.com/search?property_type=Retail+Commercial%2COffice+Commercial&limit=48&auction_type=commercial&sort=auction_end_date+asc"
Do While .Busy Or .readyState <> 4
DoEvents
Loop
'<-----------Pull Property Address for Each Entry----------->
For Each ele In .document.all
Select Case ele.className
Case "contentDetail searchResult"
RowCount = RowCount + 1
Case "searchResultAddress"
sht.Range("A" & RowCount) = ele.innertext
End Select
Next
'<-----------Pull Item # for Each Entry----------->
RowCount = 1
For Each ele In .document.all
Select Case ele.className
Case "results-section"
Case "layoutSingle left ultra-wide"
Case "searchResultGalleryData"
Case "highlightedData"
RowCount = RowCount + 1
sht.Range("B" & RowCount) = ele.innertext
End Select
Next
'<-----------Pull Starting Price for Each Entry----------->
RowCount = 1
For Each ele In .document.all
Select Case ele.className
Case "results-section"
Case "contentDetail searchResult"
Case "layoutSingle right"
Case "highlightLarge"
RowCount = RowCount + 1
sht.Range("H" & RowCount) = ele.innertext
sht.Range("G" & RowCount).FormulaR1C1 = "=REPLACE(RC[1],FIND(,RC[1]),14,"""")"
'This is to crudely attempt to remove the text "Starting Bid: " that embeds in each cell
End Select
Next
Columns("G:G").Select
Selection.Copy
Columns("G:G").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Columns("H:H").Select
Application.CutCopyMode = False
Selection.Delete Shift:=xlToLeft
End With
ScreenUpdating = True
End Sub
Does anyone know a more efficient way to scrape this data? I keep seeing answers involving jQuery, which I have no experience with. I would really appreciate some guidance. Thanks!