silverfang
New Member
- Joined
- Mar 19, 2020
- Messages
- 11
- Office Version
- 2016
- 2010
- Platform
- Windows
Hi All,
I'm finding difficulty in scraping the data from this particular website " Pulses Mills / Dal Mill - Page 274 of 274 List - Commodities India "
I need to extract the names and the relevant contact numbers of each and every dealer from pages 1 to 274.
I have done a similar project from another website, but finding it difficult to do it from this one.
I'm finding difficulty in scraping the data from this particular website " Pulses Mills / Dal Mill - Page 274 of 274 List - Commodities India "
I need to extract the names and the relevant contact numbers of each and every dealer from pages 1 to 274.
I have done a similar project from another website, but finding it difficult to do it from this one.
VBA Code:
Sub GetInfo()
Const prefix$ = "https://www.zaubacorp.com/company-list/nic-300/p-"
Const suffix$ = "-company.html"
Dim Html As New HTMLDocument, Htmldoc As New HTMLDocument
Dim newHtml As New HTMLDocument, newUrl$, elem As Object, R&, I&
Dim Wb As Workbook, ws As Worksheet, adr As Object, P&, pageNum&
Set Wb = ThisWorkbook
Set ws = Wb.Worksheets("DataContainer") '----------->create a sheet and name it `DataContainer` in order for the script to write the results in there
For pageNum = 1 To 3 '---------------------------------> this is where you put the highest number the script will traverse
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", prefix & pageNum & suffix, False
.send
Html.body.innerHTML = .responseText
End With
With Html.querySelectorAll("#table tbody tr")
For I = 0 To .Length - 1
Htmldoc.body.innerHTML = .Item(I).outerHTML
newUrl = Htmldoc.querySelector("a[href]").getAttribute("href")
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", newUrl, False
.send
newHtml.body.innerHTML = .responseText
End With
R = R + 1: ws.Cells(R, 1) = newHtml.querySelector(".container > h1").innerText
For Each elem In newHtml.getElementsByTagName("b")
If InStr(elem.innerText, "Email ID:") > 0 Then
ws.Cells(R, 2) = elem.ParentNode.innerText
Exit For
End If
Next elem
For Each adr In newHtml.getElementsByTagName("b")
If InStr(adr.innerText, "Address:") > 0 Then
ws.Cells(R, 3) = adr.ParentNode.NextSibling.innerText
Exit For
End If
Next adr
Next I
End With
Next pageNum
End Sub