Unable to fetch data from a webpage through web scraping

mittoo

New Member
Joined
Jun 28, 2018
Messages
15
How should I scrape this webpage https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/ and specifically need the ROE figure which is mentioned in the table.

I used following code in excel.

`Sub FetchData()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/", Destination:=Range( _
"$A$1"))
.Name = "www"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub
`

This is what I was able to fetch from it:

BSEINDIA
Group Websites
BSE STAR MF
BSE Institute Ltd
BSE SME Platform
BSE Startups
BSE Hi-Tech
BSE IPF
CDSL
ICCL
India INX
India ICC
Marketplace Technologies
Members Portal
RGESS
BSE Sammaan
Notices | Media Release | Trading Holidays | BSEPlus | Payments to BSE | Contact us | FeedBack
Select Language
Hindi
Marathi
Gujarati
Group Websites
BSE STAR MF
BSE Institute Ltd
BSE SME Platform
BSE Startups
BSE Hi-Tech
BSE IPF
CDSL
ICCL
India INX
India ICC
Marketplace Technologies
Members Portal
RGESS
BSE Sammaan
BSEINDIA
S&P BSE SENSEX
{{HeaderData[0].ltp}}
{{HeaderData[0].chg}} {{HeaderData[0].perchg}}%
{{HeaderData[0].chg}} {{HeaderData[0].perchg}}%
{{HeaderData[0].dttm}} | OpenPre-openClose
Get Quote
×
BSE RSS
RSS stands for Really Simple Syndication. RSS feed is a XML file that provides summaries, including links to the full versions of the content.It is available through RSS feed reader or through some browsers. It will show you what's new since the last time you checked the feed, without having to visit the website.

BSE Notices RSS

This RSS feed allows you to stay up to date with the latest Notices updated on the site.

How to Subscribe this feed?

You can subscribe to this feed by using any of the following options:

If you have RSS reader, Please Cut and Paste the URL www.bseindia.com/data/xml/notices.xml in your RSS Reader.
Download any online RSS reader of your choice.

SENSEX RSS

This is an RSS feed from the Bombay Stock Exchange website. This RSS feeds allow you to stay up to date with the latest SENSEX values on continuous basis.

Subscribe to this feed

To subscribe to it, you will need a RSS Reader. You can subscribe to this RSS feed, Please Cut and Paste the URL www.bseindia.com/data/xml/sensexrss.xml in your RSS Reader

Close
×
High Lows
High Lows
52 Week High (adjusted) {{HLdata.Fifty2WkHigh_adj|number:2}}{{HLdata.Fifty2WkHigh_adjDt}}

52 Week Low (adjusted) {{HLdata.Fifty2WkLow_adj|number:2}}{{HLdata.Fifty2WkLow_adjDt}}

52 Week High (Unadjusted) {{HLdata.Fifty2WkHigh_unadj}}

52 Week Low (Unadjusted) {{HLdata.Fifty2WkLow_unadj}}

Month H/L {{HLdata.MonthHighLow}}

Week H/L {{HLdata.WeekHighLow}}
Close
×
Notification
{{notif.Disptxt}} {{notif.Date}}
Close
Equity
Derivatives
Debt / Others
SLB
Corp Announcements
Financials
Results
Annual Reports
Meetings
Board Meetings
Shareholders Meetings
Voting Results
Corp Actions
Shareholding Pattern
Unit Holding Pattern
Corporate Governance
Bulk / Block deals
Disclosures
Insider Trading 2015
Insider Trading 1992
SAST
Pledge
SAST - Annual Disclosure
Consolidated Pledge Data
Corp Information
Peer Group
Charting
Research Reports
Notices
Additional Info

Any suggestions/help on this, Need the ROE figure rest of it is not required.

Thanks!
 

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.
The following macro will pop up a message box and display the ROE for the company specified in the URL...

Code:
Option Explicit

Sub GetROE()


    Dim IE As Object
    Dim HTMLDoc As Object
    Dim HTMLTableRow As Object
    Dim strROE As String
    
    Const READYSTATE_COMPLETE As Long = 4
    
    Set IE = CreateObject("InternetExplorer.Application")
    
    With IE
        .Visible = True
        .navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
    
    Set HTMLDoc = IE.document
    
    strROE = ""
    For Each HTMLTableRow In HTMLDoc.getElementsByTagName("tr")
        If HTMLTableRow.Cells.Length > 0 Then
            If HTMLTableRow.Cells(0).innertext = "ROE" Then
                strROE = HTMLTableRow.Cells(1).innertext
                Exit For
            End If
        End If
    Next HTMLTableRow
    
    If Len(strROE) > 0 Then
        MsgBox "ROE:  " & strROE, vbInformation
    Else
        MsgBox "ROE not found!", vbExclamation
    End If
    
    Set IE = Nothing
    Set HTMLDoc = Nothing
    Set HTMLTableRow  = Nothing
    
End Sub

Hope this helps!
 
Last edited:
Upvote 0
Small tweak so that it waits until the table is built:
Code:
Sub GetROE()

    Dim IE As Object
    Dim HTMLDoc As Object
    Dim tRows As Object
    Dim i As Long
    Dim strROE As String
    
    Const READYSTATE_COMPLETE As Long = 4
    
    Set IE = CreateObject("InternetExplorer.Application")
    
    With IE
        .Visible = True
        .navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
    
    Set HTMLDoc = IE.document
    Set tRows = HTMLDoc.getElementsByTagName("tr")
    While tRows.Length <= 1: DoEvents: Wend
    
    strROE = ""
    i = 0
    While i < tRows.Length
        If tRows(i).Cells.Length > 0 Then
            If tRows(i).Cells(0).innertext = "ROE" Then
                strROE = tRows(i).Cells(1).innertext
            End If
        End If
        i = i + 1
    Wend
    
    If Len(strROE) > 0 Then
        MsgBox "ROE:  " & strROE, vbInformation
    Else
        MsgBox "ROE not found!", vbExclamation
    End If
    
    Set IE = Nothing
    Set HTMLDoc = Nothing
    
End Sub
 
Upvote 0
Maybe we can try waiting a few seconds before we actually assign the document itself to HTMLDoc. Does this help?

Code:
Sub GetROE()

    Dim IE As Object
    Dim HTMLDoc As Object
    Dim tRows As Object
    Dim startTime As Single
    Dim i As Long
    Dim strROE As String
    
    Const READYSTATE_COMPLETE As Long = 4
    Const MAX_WAIT_SEC As Long = 10
    
    Set IE = CreateObject("InternetExplorer.Application")
    
    With IE
        .Visible = True
        .navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
    
    startTime = Timer
    
    Do
       DoEvents
    Loop Until Timer - startTime > MAX_WAIT_SEC
    
    Set HTMLDoc = IE.document
    
    Set tRows = HTMLDoc.getElementsByTagName("tr")
       
    strROE = ""
    i = 0
    Do While i < tRows.Length
        If tRows(i).Cells.Length > 0 Then
            If tRows(i).Cells(0).innertext = "ROE" Then
                strROE = tRows(i).Cells(1).innertext
                Exit Do
            End If
        End If
        i = i + 1
    Loop
    
    If Len(strROE) > 0 Then
        MsgBox "ROE:  " & strROE, vbInformation
    Else
        MsgBox "ROE not found!", vbExclamation
    End If
    
    Set IE = Nothing
    Set HTMLDoc = Nothing
    Set tRows = Nothing
    
End Sub
 
Upvote 0
Still same error. I guess the site must be much more secured thats why unable to fetch the data esp this one where they display quote.

As I am able to scrape other web pages but not this one. Thanks anyway
 
Upvote 0
It's not that the site is more secure. I've been able to get the data with all of the previous code. I'm not sure what John may have in mind, but I've amended the code so that it first searches for the div element containing the relevant tables. It loops until it either finds the div element or reaches the time limit, which in this example I've set to 30 seconds. You can change this as desired, maybe longer if you have a slow computer or internet connection. Once it finds the div element, it gets the table rows and searches for the row containing the ROE. Does this help?

Code:
Sub GetROE()

    Dim ie As Object
    Dim htmlDoc As Object
    Dim htmlDiv As Object
    Dim tRows As Object
    Dim startTime As Single
    Dim i As Long
    Dim strROE As String
    
    Const READYSTATE_COMPLETE As Long = 4
    Const MAX_WAIT_SEC As Long = 30
    
    Set ie = CreateObject("InternetExplorer.Application")
    
    With ie
        .Visible = True
        .navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
        Do While .Busy Or .readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
        Set htmlDoc = .document
    End With
    
    startTime = Timer
    
    On Error Resume Next
    Do
       DoEvents
       Set htmlDiv = htmlDoc.querySelector("div[ng-init='fnStockTrading()'")
       If Not htmlDiv Is Nothing Then Exit Do
    Loop Until Timer - startTime > MAX_WAIT_SEC
    On Error GoTo 0
    
    If htmlDiv Is Nothing Then
        MsgBox "ROE not found!", vbExclamation
    Else
        Set tRows = htmlDiv.getElementsByTagName("tr")
        strROE = ""
        i = 0
        Do While i < tRows.Length
            If tRows(i).Cells.Length > 0 Then
                If tRows(i).Cells(0).innertext = "ROE" Then
                    strROE = tRows(i).Cells(1).innertext
                    Exit Do
                End If
            End If
            i = i + 1
        Loop
        If Len(strROE) > 0 Then
            MsgBox "ROE:  " & strROE, vbInformation
        Else
            MsgBox "ROE not found!", vbExclamation
        End If
    End If
    
    Set ie = Nothing
    Set htmlDoc = Nothing
    Set htmlDiv = Nothing
    Set tRows = Nothing
    
End Sub
 
Upvote 0

Forum statistics

Threads
1,224,819
Messages
6,181,153
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top