Excel 2016 VBA - Scrape data from web HTML

alpha2007

New Member
Joined
Jun 20, 2021
Messages
24
Office Version
  1. 2016
Platform
  1. Windows
Hi guys,
I have some VBA coding experience, but I am still an intermediate.

I face the following problem:

The HTML code of the data I want to scrape cannot be accessed by using getElemetbyID, getElementbyName, getElementsbyClassName or getElementsbyTagName
Does anyone know of another way to scrape the needed data?

The respective HTML code is:


VBA Code:
<div class="row-fluid b-margin financials clearfix">
    <div class="span12">
        <div class="row-fluid lgFinancials">
            <div class="span6 specs">
                <p class="price asking help  odd">
                    <span class="title">Asking Price:</span>
                    <b>$99,900
                        <span class="help" title="The total asking price of the business for sale."><i class="fa fa-question-circle"></i></span></b>
                </p>
            </div><div class="span6 specs">
                <p class="price help  ">
                    <span class="title">Cash Flow:</span>
                    <b>$131,138
                        <span class="help" title="Arrived at by &quot;starting with your net (before tax) profit. Then, add back in any payments made to the owner, interest and any depreciation of assets.&quot; For example, if the net profit before taxes was $100,000 and the owner was paid $70,000 then the cash flow is $170,000."><i class="fa fa-question-circle"></i></span></b>
                </p>
            </div>
        </div>
        <div class="row-fluid">
            
            <div class="span6 specs">
                <p class="help  odd">
                    <span class="title">Gross Revenue:</span>
                    <b>$244,969
                        <span class="help" title="All income the business received before any cost-of-sales or expenses have been deducted."><i class="fa fa-question-circle"></i></span></b>
                </p><p class="notDisclosed help  ">
                    <span class="title">EBITDA:</span>
                    <b>N/A
                        <span class="help" title="Earnings Before Interest, Taxes, Depreciation and Amortization"><i class="fa fa-question-circle"></i></span></b>
                </p><p class="help  odd">
                    <span class="title">FF&amp;E:</span>
                    <b>$16,000
                        <span class="help" title="Furniture, fixtures and equipment that will remain with the business, such as desks, office cubicles, decor elements of a restaurant or showroom, computers and office machines, pots and pans, dishes, display cases, manufacturing equipment, etc., depending on the type of business."><i class="fa fa-question-circle"></i></span></b>
                </p>
            </div>

Would be great if I could get a solution here on the forum

Thanks!
Tony
 
JohnnyL, thanks - but doesn't work!

Domenic, thanks! Works perfect!

Domenic, sometimes a webpage based on a URL in list column A:A shows

This listing is no longer available.​

instead of the offer.

The VBA code can't handle it?

Such a URL is


HTML Code:

HTML:
<!-- NoListingFound Start-->
<div class="row-fluid">
    <div class="span8">
        <h1 style="margin: 10px 0;">This listing is no longer available.</h1>
        <h4 style="margin-top: 0;">There are several possible explanations for this:</h4>
        <ul>
            <li>The business has been sold or removed from the market.</li>
            <li>The business was listed by a broker who may no longer be representing the seller.</li>
            <li>The listing expired and was not renewed.</li>
        </ul>
        <p>If you think you have encountered a problem, <a href="/feedback.htm"> Contact Us.</a></p>
        <p>
            <a id="ctl00_ctl00_Content_ContentPlaceHolder1_wideProfile_ctl05_btnBBSHome" Class="btn btn-primary btnBBSHome" href="https://www.bizbuysell.com/">BizBuySell Home</a>
        </p>
    </div>


Is there a code to handle this occurrence and continue with the next URL 1 row down?

Thank you so much!
 
Upvote 0

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
Try...

VBA Code:
        With doc.querySelectorAll("div.span8 > h1,h2")
            If .Length > 0 Then
                If Left(.Item(0).innerText, 36) <> "This listing is no longer available." Then
                    'code to retrieve the data (ie. asking price, cash flow, etc)
                    '
                    '
                End If
            End If
        End With
 
Upvote 0
Domenic, thank you so much! I have been searching the internet for some 3 hours for a solution, but I couldn't find any working. It is very hard to get information and code samples on VBA web scraping that works!

But your solutions just work like Swiss clock work!

Here is my VBA code with your highly valuable input. Do you see any problems with the code?

VBA Code:
Sub webElementsWithoutID() 's()
    
    Dim IE As Object
    Dim doc As HTMLDocument
    Dim ws As Worksheet
    Dim element As Object
    Dim ele As Object
    Dim txt As String
    Dim askingPrice As String
    Dim cashFlow As String
    Dim grossRevenue As String
    Dim ebitda As String
    Dim ffe As String
    
    Set ws = ThisWorkbook.Sheets("data")

    ws.Range("AZ1").Value = "=CountA(A:A)"
    intRows = ws.Range("AZ1").Value

    Set IE = CreateObject("InternetExplorer.application")
    IE.Visible = True

    For rowNo = 1 To intRows
        
        strUrl = ws.Range("A" & rowNo).Text
        IE.navigate strUrl
                
        Do While IE.Busy Or IE.readyState <> 4
        Application.Wait DateAdd("s", 1, Now)
        Loop
                
        Set doc = IE.document
                    
                  
        With doc.querySelectorAll("div.span8 > h1,h2")
            If .Length > 0 Then
                If Left(.Item(0).innerText, 36) <> "This listing is no longer available." Then
              
                    strTitle = doc.getElementsByClassName("bfsTitle")(0).innerText
                    ws.Range("C" & rowNo).Value = strTitle
                    strSubTitle = doc.getElementsByClassName("span8")(0).innerText
                    ws.Range("D" & rowNo).Value = strSubTitle
                      
                    For Each ele In doc.getElementsByClassName("title")
                        txt = ele.parentElement.innerText
                        
                        If Left(txt, 12) = "Asking Price" Then
                            askingPrice = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 9) = "Cash Flow" Then
                            cashFlow = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 13) = "Gross Revenue" Then
                            grossRevenue = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 6) = "EBITDA" Then
                            ebitda = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 4) = "FF&E" Then
                            ffe = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        End If
                    Next ele
                            
                    ws.Range("E" & rowNo).Value = askingPrice
                    ws.Range("F" & rowNo).Value = cashFlow
                    ws.Range("G" & rowNo).Value = grossRevenue
                    ws.Range("H" & rowNo).Value = ffe
                End If
                
                If ws.Range("C" & rowNo).Value = "" Then
                    ws.Range("B" & rowNo).Value = "Not available"
                End If
            
            End If
        End With
    
    Next
    
    IE.Quit
    Set IE = Nothing
    MsgBox "done"

End Sub


I hope you do not mind if I keep on asking questions about VBA web scraping?
Should I open a new thread with a new question?

Thanks again for your wonderful help!
Tony
 
Upvote 0
I've only taken a quick look at your complete code, but it looks fine. However, always test any code thoroughly, just to make sure it does what you want in all circumstances.

And yes, whenever you have a new question, please start a new thread.

Cheers!
 
Upvote 0
Hello Domenic,

I just tried to scrape data from the URL

Tree Service Plus More in Pinellas County, Florida - BizBuySell

and I found that your script did not scrape "N/A" as the correct value of ebitda and did not leave the value empty but scraped the next data of ffe and inserted it in the cell in which the value N/A or empty should be for ebitda

Why is the code doing that? Can it be corrected?

Thanks for checking once more into the code!

VBA Code:
    Dim ele As Object
    Dim txt As String
    Dim askingPrice As String
    Dim cashFlow As String
    Dim grossRevenue As String
    Dim ebitda As String
    Dim ffe As String
    
    For Each ele In doc.getElementsByClassName("title")
        txt = ele.parentElement.innerText
        If Left(txt, 12) = "Asking Price" Then
            askingPrice = Trim(Mid(txt, InStrRev(txt, ":") + 1))
        ElseIf Left(txt, 9) = "Cash Flow" Then
            cashFlow = Trim(Mid(txt, InStrRev(txt, ":") + 1))
        ElseIf Left(txt, 13) = "Gross Revenue" Then
            grossRevenue = Trim(Mid(txt, InStrRev(txt, ":") + 1))
        ElseIf Left(txt, 6) = "EBITDA" Then
            ebitda = Trim(Mid(txt, InStrRev(txt, ":") + 1))
        ElseIf Left(txt, 4) = "FF&E" Then
            ffe = Trim(Mid(txt, InStrRev(txt, ":") + 1))
        End If
    Next ele
 
Upvote 0
Hello Domenic,

I just tried to scrape data from the URL

Tree Service Plus More in Pinellas County, Florida - BizBuySell

and I found that your script did not scrape "N/A" as the correct value of ebitda and did not leave the value empty but scraped the next data of ffe and inserted it in the cell in which the value N/A or empty should be for ebitda

Why is the code doing that? Can it be corrected?
The code is doing that because you told it to do that.

VBA Code:
                    ws.Range("E" & rowNo).Value = askingPrice
                    ws.Range("F" & rowNo).Value = cashFlow
                    ws.Range("G" & rowNo).Value = grossRevenue
                    ws.Range("H" & rowNo).Value = ffe        ' <--------
 
Upvote 0
I tried it with the Tree Service URL and it worked fine, as per attached image. Try posting your complete code so that we can take a look at it.
 

Attachments

  • alpha2007 2021-06-23.png
    alpha2007 2021-06-23.png
    77.9 KB · Views: 24
Upvote 0
Sorry, I just checked it once more and now the data was scraped correctly
I don't know what has happened before?

So it is working just fine!

Domenic, could I use the same kind of code as your code


VBA Code:
                    For Each ele In doc.getElementsByClassName("title")
                                        txt = ele.parentElement.innerText

                        If Left(txt, 12) = "Asking Price" Then
                            askingPrice = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 9) = "Cash Flow" Then
                            cashFlow = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 13) = "Gross Revenue" Then
                            grossRevenue = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 6) = "EBITDA" Then
                            ebitda = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 4) = "FF&E" Then
                            ffe = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 9) = "Inventory" Then
                            inventory = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 11) = "Real Estate" Then
                            realEstate = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 4) = "Rent" Then
                            rent = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        ElseIf Left(txt, 11) = "Established" Then
                            established = Trim(Mid(txt, InStrRev(txt, ":") + 1))
                        End If
                    Next ele


to scrape in another part of the website that starts with "Detailed Information" and has data elements that could change according to the data the seller entered.

The HTML code is

HTML:
<h3>Detailed Information</h3>
<dl id="ctl00_ctl00_Content_ContentPlaceHolder1_wideProfile_listingDetails_dlDetailedInformation" class="listingProfile_details">

<dt><strong>Location:</strong></dt>
<dd>Pinellas County, FL</dd>

<dt><strong>Inventory:</strong></dt>
<dd>Included in asking price</dd>

<dt><strong>Employees:</strong></dt>
<dd>8 FT</dd>

<dt><strong>Furniture, Fixtures, & Equipment (FF&amp;E):</strong></dt>
<dd>Included in asking price</dd>

<dt><strong>Facilities:</strong></dt>
<dd>Business includes over 1 acre of well located property.  The compound includes an efficient layout with multiple structures to protect and house equipment, materials and vehicles.  All work on the property has been properly permitted, is insured and has been inspected regularly.  An environmental study/inspection can be included with the sale.</dd>

<dt><strong>Competition:</strong></dt>
<dd>There are always new workers with a pickup willing to trim trees.  This more than 25+ year old company has the experience, equipment, and credentials to service large corporate and government accounts.  Majority of the team of full time employees have been with the company for many years.  In addition to being well treated and respected, workers are employed year round and offered benefits.</dd>

<dt><strong>Growth &amp; Expansion:</strong></dt>
<dd>Current owner has grown this company from nothing to a company worth over $1 Million.  He is now ready to help a new owner take this well respected company and grow further.  The owner will introduce a new owner to his large base of customers, vendors, suppliers and industry sources.  The team, equipment, and formula for expanding this successful business is included in this offering.</dd>

<dt><strong>Support &amp; Training:</strong></dt>
<dd>Will train for 4 weeks @ no cost
This is a full service tree company.  The company provides ancillary wood related services, which the details will be shared upon confidential discussions about the business.  Contact the broker for details.</dd>

<dt><strong>Reason for Selling:</strong></dt>
<dd>Going to relax.</dd>


The possible data field names I collected from several BizBuySell.com offer web page HTML:

Location
Type
Inventory
Real Estate
Building SF
Building Status
Lease Expiration
Employees
Furniture, Fixtures, & Equipment (FF&amp;E)
Facilities
Competition
Growth &amp; Expansion
Financing
Support &amp; Training
Reason for Selling
Franchise
Home-Based
Business Website

Would the class name be "listingProfile_details" and the code lines as following?

VBA Code:
For Each ele In doc.getElementsByClassName("listingProfile_details")
    txt = ele.parentElement.innerText
   
    If Left(txt, 8) = "Location" Then
        location = Trim(Mid(txt, InStrRev(txt, ":") + 1))
    ElseIf Left(txt, 4) = "Type" Then
        type = Trim(Mid(txt, InStrRev(txt, ":") + 1))
    .
    .
    .
    End If
Next ele
 
Upvote 0
I don't see some of that information available on the URL. In any case, it looks like your requirements have changed, which means that it would require more time to figure out what you want and re-write the code. I would suggest that you start a new thread and hopefully someone will provide you with the help you need.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top