Extract all the links of a website

Thank you for your interview. I could ask you to insert it in the dropbox link file along with the other code with the creation of the buttons that activates each code. Then I noticed that activating the code there could be some problems with certificates that are asked that block the macro until I click on the "yes" of the mask that appears. Is it possible to automatically accept and save the data automatically and then restart the macro from the first raw line? Then I would like to learn how to insert the email extraction part included in that code.
I intend to be able to attach here the complete file of the code and of the buttons inside the sheet to activate the macros individually :)
 
Upvote 0

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
I am having no time to work on this…

When you mention accepting macros, are you going to run the code only on your own computer or distribute it to other people?
 
Upvote 0
I am having no time to work on this…

When you mention accepting macros, are you going to run the code only on your own computer or distribute it to other people?
Only on mine.
There will be a translation problem, sorry.
I'll explain what I would like to say:
If I start the macro I would like the processed data to be knew so if there should be problems that block the macro I will not lose the processes processed up to that moment.
Subsequently the macro must be able to resume from the interrupted point
 
Upvote 0
  • The code below makes the URL bold after processing it, and on the next execution will only work on non-bold entries.
  • Where would you like to place the email list?
VBA Code:
Sub CommandButton12()
    Const colUrl As Long = 1      'Must always be the first column
    Const colFacebook As Long = 3 'Must always be the last column of Some platforms
    Const colError As Long = 4    'Must always be the last column

    Dim url As String, http As Object, htmlDoc As Object, nodeAllLinks As Object
    Dim onelink As Object, pageLoadSuccessful As Boolean
    Dim tbl_url_oal As String, tbl_all As String, cr_tbl_urls As Long, lastRowTableUrls&
    Dim cr_table_all(1 To 3) As Long, lastRowTableAll&, addressCounters&(2 To colFacebook)
    Dim checkCounters As Long, cel As Range
   
    tbl_url_oal = "foglio2"             'Name of Sheet
    cr_tbl_urls = 4           'First row for content
    tbl_all = "Sheet1"     'Name of Sheet
    Sheets(tbl_url_oal).Activate
    With New XMLHTTP60
            On Error Resume Next
        For Each cel In Range("A4:A" & Cells(Rows.Count, 1).End(xlUp).Row)
            .Open "GET", cel.Value, False
            .send
            If .Status = 200 Then cel.Offset(, 1).Value = Split(Split(.responseText, "<title>")(1), "</")(0)
        Next
    End With
    For checkCounters = colUrl To colFacebook
        cr_table_all(checkCounters) = 2   'First rows for content
    Next
    Set htmlDoc = CreateObject("htmlfile")
    Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    'Loop over all URLs in column A in the URL source sheet
    Do While Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Value <> ""
    If Not Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold Then
        If ActiveSheet.Name = tbl_url_oal Then
        Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Activate
        End If
        url = Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1)
        On Error Resume Next
        http.Open "GET", url, False
        http.send
       If Err.Number = 0 Then
            pageLoadSuccessful = True
        End If
        On Error GoTo 0
        If pageLoadSuccessful Then
            htmlDoc.body.innerHTML = http.responseText
            Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
    cr_table_all(1) = Sheets(tbl_all).Range("a" & Rows.Count).End(xlUp).Row + 1
 
For Each onelink In nodeAllLinks
DoEvents
Sheets(tbl_all).Cells(cr_table_all(1), 2) = Right(onelink.href, Len(onelink.href) - InStr(onelink.href, ":"))
Sheets(tbl_all).Cells(cr_table_all(1), 1) = url
cr_table_all(1) = cr_table_all(1) + 1
addressCounters(2) = addressCounters(2) + 1
Next
        Else
          End If
        pageLoadSuccessful = False
        Erase addressCounters
        lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
        For checkCounters = colUrl To colFacebook
            cr_table_all(checkCounters) = lastRowTableAll + 1 'First rows for next page content
        Next
         Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold = True
    End If
        cr_tbl_urls = cr_tbl_urls + 1
Loop
Set http = Nothing: Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set onelink = Nothing
Dim r As Range, c
Sheets("Sheet1").Activate
[b1] = "header"
[c1] = [b1]
For c = 4 To Sheets("Foglio2").[3:3].Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
    [c2] = "*" & Sheets("Foglio2").Cells(3, c) & "*"
    Columns("B:B").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Range("c1:c2"), CopyToRange:=[e1], Unique:=True
    Set r = [e1].CurrentRegion
    Set r = Range(Cells(2, 5), Cells(r.Rows.Count, 5))
    If Len([e2]) Then r.Copy Sheets("Foglio2").Cells(4, c)
    r.Delete
Next
End Sub
 
Upvote 0
  • The code below makes the URL bold after processing it, and on the next execution will only work on non-bold entries.
  • Where would you like to place the email list?
VBA Code:
Sub CommandButton12()
    Const colUrl As Long = 1      'Must always be the first column
    Const colFacebook As Long = 3 'Must always be the last column of Some platforms
    Const colError As Long = 4    'Must always be the last column

    Dim url As String, http As Object, htmlDoc As Object, nodeAllLinks As Object
    Dim onelink As Object, pageLoadSuccessful As Boolean
    Dim tbl_url_oal As String, tbl_all As String, cr_tbl_urls As Long, lastRowTableUrls&
    Dim cr_table_all(1 To 3) As Long, lastRowTableAll&, addressCounters&(2 To colFacebook)
    Dim checkCounters As Long, cel As Range
  
    tbl_url_oal = "foglio2"             'Name of Sheet
    cr_tbl_urls = 4           'First row for content
    tbl_all = "Sheet1"     'Name of Sheet
    Sheets(tbl_url_oal).Activate
    With New XMLHTTP60
            On Error Resume Next
        For Each cel In Range("A4:A" & Cells(Rows.Count, 1).End(xlUp).Row)
            .Open "GET", cel.Value, False
            .send
            If .Status = 200 Then cel.Offset(, 1).Value = Split(Split(.responseText, "<title>")(1), "</")(0)
        Next
    End With
    For checkCounters = colUrl To colFacebook
        cr_table_all(checkCounters) = 2   'First rows for content
    Next
    Set htmlDoc = CreateObject("htmlfile")
    Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    'Loop over all URLs in column A in the URL source sheet
    Do While Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Value <> ""
    If Not Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold Then
        If ActiveSheet.Name = tbl_url_oal Then
        Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Activate
        End If
        url = Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1)
        On Error Resume Next
        http.Open "GET", url, False
        http.send
       If Err.Number = 0 Then
            pageLoadSuccessful = True
        End If
        On Error GoTo 0
        If pageLoadSuccessful Then
            htmlDoc.body.innerHTML = http.responseText
            Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
    cr_table_all(1) = Sheets(tbl_all).Range("a" & Rows.Count).End(xlUp).Row + 1
 
For Each onelink In nodeAllLinks
DoEvents
Sheets(tbl_all).Cells(cr_table_all(1), 2) = Right(onelink.href, Len(onelink.href) - InStr(onelink.href, ":"))
Sheets(tbl_all).Cells(cr_table_all(1), 1) = url
cr_table_all(1) = cr_table_all(1) + 1
addressCounters(2) = addressCounters(2) + 1
Next
        Else
          End If
        pageLoadSuccessful = False
        Erase addressCounters
        lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
        For checkCounters = colUrl To colFacebook
            cr_table_all(checkCounters) = lastRowTableAll + 1 'First rows for next page content
        Next
         Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold = True
    End If
        cr_tbl_urls = cr_tbl_urls + 1
Loop
Set http = Nothing: Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set onelink = Nothing
Dim r As Range, c
Sheets("Sheet1").Activate
[b1] = "header"
[c1] = [b1]
For c = 4 To Sheets("Foglio2").[3:3].Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
    [c2] = "*" & Sheets("Foglio2").Cells(3, c) & "*"
    Columns("B:B").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Range("c1:c2"), CopyToRange:=[e1], Unique:=True
    Set r = [e1].CurrentRegion
    Set r = Range(Cells(2, 5), Cells(r.Rows.Count, 5))
    If Len([e2]) Then r.Copy Sheets("Foglio2").Cells(4, c)
    r.Delete
Next
End Sub
if you do not create problems, could you insert the code in the file that I have attached with the code to extract the links inside, so you can try it directly with the creation of the buttons?
I wouldn't want to make mistakes if I do it myself ..
 
Upvote 0
Good evening.
I apologize for not seeing the button.
I changed the files in the dropbox originally shared link.
I try to explain myself better.
Opening the folder you will find the file link. In the first sheet there is the code that extrapolates the links by line. It is possible to update the code to ensure that the extraction takes place by line with saving before moving on to the next one and when the macro is started it must start from the first line not yet examined.
In sheet two the code must insert the links of sheet 1 in relation to the word contained in the column header. I need this to find the links that contain the contacts to extract. In this sheet for the link column of the facebook page, if it is not considered useful to copy the link of sheet 1 there is the code contained in the "Regxp Emails2" dropbox folder from which you can copy the criteria used for the extraction dfel link and apply it to the sheet also with separate code.
For the extraction of the site name there is the other macro present in the dropbox folder to add to the code.
finally, for sheet 3, the macro based on the links inserted in sheet 2 extrapolates: emails and telephone numbers and inserts them in the appropriate column. If the macro contained in "Regxp Emails2" for the email extraction part uses the regex code I can insert it here otherwise I have the code that extracts emails and telephone numbers using the regex code

 
Upvote 0
Could you insert a button for each function?
Please clarify how many buttons you want, where they should be placed and what each one is supposed to do.
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,325
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top