Skip urls with captcha

Sharid

Well-known Member
Joined
Apr 22, 2007
Messages
1,066
Office Version
  1. 2016
Platform
  1. Windows
I have a list of URLs on sheet2 "Url List" when my code runs it works it way through these URLs and the extracted date is placed in sheet1.

Is there a way of skipping Urls with captcha?

e.g. it times out and moves to next url if no data is extracted with X amount of secconds



If this is possible can the SKIPPED urls on sheet2 "Url List" be ID, e.g. in the next column it places the word "Captcha" This way they can be actioned manually.

Thanks
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple
Just to make it clear, I'm not asking to hack a Captcha.

I just need a code that if the page opens in IE and a captcha is required, then the code just moves to the next URL in the list, otherwise it will get stuck and has to be manually done.
 
Upvote 0
I have this in my code, does any body know if this will do it, I have set it to 15

Code:
'Show Internet Explorer and add delay in seconds if needed
    With ie
        .Visible = True
       [COLOR=#ff0000] Application.Wait Now + TimeValue("0:00:15")[/COLOR]
        
        For x = LBound(arr, 1) To UBound(arr, 1)
            .navigate arr(x, 1)
            While .Busy Or .readyState <> 4: DoEvents: Wend
            
            On Error Resume Next

Thanks
 
Upvote 0
Assuming that your urls start in row 2:


Code:
Private Sub fbStart_Click()
'Set sheet2 URL List and open Internet Explorer
    Dim lr          As Long
    Dim x           As Long
    Dim arr()       As Variant
    Dim wks         As Worksheet
    Dim ie          As Object
    Dim dd(1 To 2)  As String
    Dim Fr          As Long
    
    On Error Resume Next
    Application.ScreenUpdating = False
        
    Set wks = ThisWorkbook.Sheets("Url List")
    With wks
        Fr = .Cells(.Rows.Count, 6).End(xlUp).Offset(1).Row
        lr = .Cells(.Rows.Count, 1).End(xlUp).Row
        .Cells(1, 5).Value = lr
        arr = .Range(.Cells(Fr, 1), .Cells(lr, 1)).Value
    End With
        
    'Show Internet Explorer and add delay in seconds if needed
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        Application.Wait Now + TimeValue("0:00:0")
        
        For x = LBound(arr, 1) To UBound(arr, 1)
            .navigate arr(x, 1)
            wtime = Time
            Do While .Busy Or .readyState <> 4
                DoEvents
                If Time > (wtime + TimeValue("00:00:05")) Then
                    Cells(x + 1, "C").Value = "Captcha"
                    Exit Do
                End If
            Loop
            
            'On Error Resume Next
            'Variable for document or data which need to be extracted out of webpage, change innertext number if same class used
            Dim doc As HTMLDocument
            Set doc = ie.document
            dd(1) = doc.getElementsByClassName("_50f4")(2).innerText
            dd(2) = doc.getElementsByClassName("_50f4")(3).innerText
            
            'On Error Resume Next
            'Paste in this sheet
            With Sheet1
                .Cells(.Rows.Count, "A").End(xlUp).Offset(1, 0).Resize(, 2).Value = dd
            End With
            ' Put no1 in sheet2 column B
            Sheets("Url List").Range("B" & Rows.Count).End(xlUp).Offset(1, 0).Value = 1
             'Deletes duplicates in column A Sheet1
            Columns(1).RemoveDuplicates Columns:=Array(1)
            Columns(2).RemoveDuplicates Columns:=Array(1)
             'Count No1 in sheet2 Column B
            With Worksheets("Url List")
                Lastrow = .Cells(.Rows.Count, "B").End(xlUp).Row
                Sheets("Url List").Range("B1").Value = Lastrow
            End With
            Call Autoclick_Click
        Next x
       .Quit
    End With
    
    'Hide FaceBook Scraper Form
    ScraperForm.Hide
       
End Sub
 
Upvote 0
Thanks for the update, just 1 issue in placing the word "Captcha"

It is placed in Column C of Sheet1 when it SHOULD Go into Column C of Sheet2 as that is where the URLs are at. I have attached the file via a link, this is a FaceBook Scraper if you have the Business Page. https://app.box.com/s/l83a5snl10651p3a1gmfn1e2gnxtmqlz

Another FEW Issue, Rather than put another Thread,

1) Can You or anyone look at how it displays the results in sheet 1 as extracted emails should go into Column A and Urls into Column B. Currently it is not pasting correctly You will see the results.

2) Also, the result do not show until the form is closed,

3) An Auto save at X amounts of results would be good as it has crashed a few times.


Thanks
 
Upvote 0
Change this

Code:
[COLOR=#333333]Cells(x + 1, "C").Value = "Captcha"[/COLOR]

By:
Code:
​[COLOR=#333333]Sheets("Url List").[/COLOR][COLOR=#333333]Cells(x + 1, "C").Value = "Captcha"[/COLOR]
 
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top