Scrape Specific Website Data into Excel with Auto Link Numbering

blackhat7

New Member
Joined
Dec 18, 2018
Messages
16
hi guys, not sure if the subject is correct

in short:

i want to scrape the following data from this website

"Company Name"
"Company Code"
"Labor Office"

here's the link
http://eservices.mohre.gov.ae/NewMolGateway/english/Services/EQuotaStatus.aspx?Code=715280

note that i want to start the number from 1 e.g.

http://eservices.mohre.gov.ae/NewMolGateway/english/Services/EQuotaStatus.aspx?Code=1


and scrape all the way until a specific number, e.g.

http://eservices.mohre.gov.ae/NewMolGateway/english/Services/EQuotaStatus.aspx?Code=999999

and have all the data into excel where

column A1 = Company Name
Column A2 = Company Code
Column A3 = Labor Office

and so on, how can i achieve this?
 
well, thank you for your efforts

then i have two options

1- use the web query link, which i don't know how we can extract data from (it shows results instantly)

here's the link https://www.mohre.gov.ae/en/services/enquiry-services.aspx

choose "company information" from drop down list

and enter company number

2- use another link which requires credentials (username,password + security question) and from there the query is also instant
 
Upvote 0

Excel Facts

Lock one reference in a formula
Need 1 part of a formula to always point to the same range? use $ signs: $V$2:$Z$99 will always point to V2:Z99, even after copying
Strooman, waiting for your valuable feedback

I don't have a solution for the slow processing of the webpages. Your suggestions from post [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=1]#1 1[/URL] are not workable for the reason you give. Option #1 doesn't present us an url we can work with. If we fill in the desired information we are presented with this url: https://www.mohre.gov.ae/en/services/enquiry-services.aspx. which don't give us any parameters to work with or to adjust. Option #2 has the same limitation. I'm sorry I cannot help you any further but I will look during the weekend if it is possible to fill in the information needed and click the search button via VBA.
 
Upvote 0
Another approach. This time we don't use the XML protocol but instead the Internet Explorer. Try this:
Pay special attention to this code line:
For x = 715100 To 715120 '< < < Adjust to your needs
Don't make the gab between the numbers to big. Takes to much processing time when try to get hundred of thousand of records.

Code:
Sub a1081120_Third_Version()
    Dim IE As New SHDocVw.InternetExplorer
    Dim URL As String
    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim htmlDoc As New MSHTML.HTMLDocument
    Dim htmlOptions As MSHTML.IHTMLElementCollection
    Dim htmlOption As MSHTML.IHTMLElement
    Dim htmlInputs As MSHTML.IHTMLElementCollection
    Dim htmlInput As MSHTML.IHTMLElement
    Dim htmlButtons As MSHTML.IHTMLElementCollection
    Dim htmlButton As MSHTML.IHTMLElement
    Dim htmlTd As MSHTML.IHTMLElementCollection
    Dim tdObj As MSHTML.IHTMLElement
    Dim x, z, lngRow, lngColumn As Long
    
    lngRow = 2
    
    'Open Internet Explorer and navigate to webpage
    IE.Visible = True
    IE.navigate "https://www.mohre.gov.ae/en/services/enquiry-services.aspx"
    
    'Wait till IE is completely ready/visible
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    
    'Get the content of page
    Set htmlDoc = IE.Document
    
    'Cycle through the options of the dropdwon
    Set htmlOptions = htmlDoc.getElementsByTagName("option")
    
    'We want Company Infoirmation, so select that
    For Each htmlOption In htmlOptions
        If htmlOption.getAttribute("value") = "CI" Then
            htmlOption.Selected = True
            Exit For
        End If
    Next htmlOption
    
    'Very Important ! ! ! Don't make the gap between numbers to big
    For x = 715100 To 715120 '< < < Adjust to your needs
        lngColumn = 1
        
        'Search the textbox to put ID number in
        Set htmlInputs = htmlDoc.getElementsByTagName("input")
        For Each htmlInput In htmlInputs
            If htmlInput.className = "large nomargin" Then
                htmlInput.Value = x
                Exit For
            End If
        Next htmlInput
        
        Set htmlButtons = htmlDoc.getElementsByTagName("a")
        
        'Find the button to click/search
        For Each htmlButton In htmlButtons
            If htmlButton.className = "search btn fi-search" Then
                htmlButton.Click
                
                'Wait for the data to load
                Application.Wait (Now + TimeValue("0:00:01"))
                Set htmlTd = htmlDoc.getElementsByTagName("td")
                z = 1
                
                'Loop through all td elements . . .
                For Each tdObj In htmlTd
                
                'tdObj 1, 2 and 10 have the data we want
                    Select Case z
                        Case 1
                            'ID number
                            Cells(lngRow, lngColumn).Value = tdObj.innerText
                            lngColumn = lngColumn + 1
                        Case 2
                            'Company info
                            Cells(lngRow, lngColumn).Value = tdObj.innerText
                            lngColumn = lngColumn + 1
                        Case 10
                            'Labour Office
                            Cells(lngRow, lngColumn).Value = tdObj.innerText
                            lngRow = lngRow + 1
                    End Select
                    z = z + 1
                Next tdObj
                Exit For
            End If
        Next htmlButton
    Next x
    ''Quit all instances of Internet Explorer
    Kill_IE
End Sub

Sub Kill_IE()
    Dim objWMI As Object, objProcess As Object, objProcesses As Object
    Set objWMI = GetObject("winmgmts://.")
    Set objProcesses = objWMI.ExecQuery( _
        "SELECT * FROM Win32_Process WHERE Name = 'iexplore.exe'")
    For Each objProcess In objProcesses
        Call objProcess.Terminate
    Next
    Set objProcesses = Nothing: Set objWMI = Nothing
End Sub
 
Upvote 0
dear strooman, thank you for your time and help, i'm running the code and hopefully it will work fine

i will get in touch with you in PM

--

i'm trying to retrieve around 1,000,000 records,

will that work okay? and can i abort the process in the mid and continue it at a later time? given the time it takes for each record "1 sec" it will take around 12 days to finish retrieval.
 
Last edited:
Upvote 0
i'm trying to retrieve around 1,000,000 records, will that work okay?

The website is slow in responding therefore that will take a lot of time. Of course you can run the code when you go to sleep ;o))


and can i abort the process in the mid and continue it at a later time?

Then I would run the code with a limited number of page requests and experiment a little bit. You can also hit ESC a couple of times when executing. That will interrupt the execution. For example start with 100 requests
For x = 1 To 100 or For x = 715100 To 715200
And DON'T FORGET TO SAVE your work, otherwise you have to start all over again when Excel freezes or quits or hangs or whatever.

given the time it takes for each record "1 sec" it will take around 12 days to finish retrieval.

That's veryyyyy long. Depends how important the data for your purpose is. One time I had to retrieve all posts of a forum that took me 3 days ( and nights). I let the script run when I was working or going to sleep.
 
Last edited:
Upvote 0
1- yes i'll run it while asleep
2- BAM! that's what i did, i'm doing the following (knowing i'll run the computer for the next 6 hours, 6*60*60 = 21600) so run from 1000, to 21600, and then save and so on
3- well, i can wait for 12 days, no issues
 
Upvote 0
Put this in the select case statement just before Case 10

Code:
Case 8
'License Number
Cells(lngRow, lngColumn).Value = tdObj.innerText
lngColumn = lngColumn + 1
 
Upvote 0

Forum statistics

Threads
1,224,822
Messages
6,181,165
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top