Scrape data from a webPage

silverfang

New Member
Joined
Mar 19, 2020
Messages
11
Office Version
  1. 2016
  2. 2010
Platform
  1. Windows
Hi All,

I'm finding difficulty in scraping the data from this particular website " Pulses Mills / Dal Mill - Page 274 of 274 List - Commodities India "
I need to extract the names and the relevant contact numbers of each and every dealer from pages 1 to 274.

I have done a similar project from another website, but finding it difficult to do it from this one.

VBA Code:
Sub GetInfo()
    Const prefix$ = "https://www.zaubacorp.com/company-list/nic-300/p-"
    Const suffix$ = "-company.html"
    Dim Html As New HTMLDocument, Htmldoc As New HTMLDocument
    Dim newHtml As New HTMLDocument, newUrl$, elem As Object, R&, I&
    Dim Wb As Workbook, ws As Worksheet, adr As Object, P&, pageNum&

    Set Wb = ThisWorkbook
    Set ws = Wb.Worksheets("DataContainer") '----------->create a sheet and name it `DataContainer` in order for the script to write the results in there

    For pageNum = 1 To 3  '---------------------------------> this is where you put the highest number the script will traverse
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", prefix & pageNum & suffix, False
            .send
            Html.body.innerHTML = .responseText
        End With

        With Html.querySelectorAll("#table tbody tr")
            For I = 0 To .Length - 1
                Htmldoc.body.innerHTML = .Item(I).outerHTML
                newUrl = Htmldoc.querySelector("a[href]").getAttribute("href")

                With CreateObject("MSXML2.XMLHTTP")
                    .Open "GET", newUrl, False
                    .send
                    newHtml.body.innerHTML = .responseText
                End With

                R = R + 1: ws.Cells(R, 1) = newHtml.querySelector(".container > h1").innerText

                For Each elem In newHtml.getElementsByTagName("b")
                    If InStr(elem.innerText, "Email ID:") > 0 Then
                        ws.Cells(R, 2) = elem.ParentNode.innerText
                        Exit For
                    End If
                Next elem

                For Each adr In newHtml.getElementsByTagName("b")
                    If InStr(adr.innerText, "Address:") > 0 Then
                        ws.Cells(R, 3) = adr.ParentNode.NextSibling.innerText
                        Exit For
                    End If
                Next adr
            Next I
        End With
    Next pageNum
End Sub
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
I tired to use Power Query to atleast take down the list from pages 1 to 274, but it shows nothing.
 
Upvote 0

Forum statistics

Threads
1,224,818
Messages
6,181,151
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top