# Extract all the links of a website



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## tonyyy (Aug 19, 2018)

stefanoste78,

You might consider the following...


```
Sub GetAllLinks()
Application.ScreenUpdating = False
Dim ie As Object, AllHyperlinks As Object, hyper_link As Object
Dim LastRow As Long, NextRow As Long
Dim url_name As Range

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
Set url_name = ActiveSheet.Range("A1")
If url_name = "" Or Left(url_name, 4) <> "http" Then
    MsgBox "Please enter a valid url."
    Exit Sub
End If

ie.navigate (url_name)
Do
    DoEvents
Loop Until ie.ReadyState = 4
Set AllHyperlinks = ie.Document.getElementsByTagName("A")

NextRow = 1
With ActiveSheet
    .Columns(2).Clear
    For Each hyper_link In AllHyperlinks
        .Cells(NextRow, 2).Value = hyper_link
        NextRow = NextRow + 1
    Next
    LastRow = .Cells(Rows.Count, 2).End(xlUp).Row
    .Cells(1, 2).EntireColumn.AutoFit
End With

AppActivate "MicroSoft Excel"
Application.ScreenUpdating = True
MsgBox "There are " & LastRow & " links."
End Sub
```

Enter the website address into Range("A1"); the links from that site will populate Column B. Please note you'll need Internet Explorer (does not need to be open) on your Windows pc.

Cheers,

tonyyy


----------



## stefanoste78 (Aug 20, 2018)

Thanks tony.
when  I start the macro it seems that it is loading and after a few moments  ago as if it has not been activated and then reload and a mask appears  with this message and debug:

run-tine error 5
routine call or argument invalid

debugging:
AppActivate "MicroSoft Excel"


I noticed that it does not extract all the links, in fact if you use this link:
https://www.uniba.it

Do not extrapolate this link:
https://www.uniba.it/ateneo/rettorato

Would  you like to insert the links on the same line of the link and be able  to insert other links in the "to" column to make the extrapolation  process be applied for more links?


----------



## tonyyy (Aug 21, 2018)

> run-tine error 5
> routine call or argument invalid
> 
> debugging:
> AppActivate "MicroSoft Excel"



I can't duplicate the error. Are you using a US English version of Excel? On a Windows pc?

You can, I think, simply delete the line "AppActivate "MicroSoft Excel"" - all it does is bring Excel to the forefront of the screen.

As for extracting the links to https://www.uniba.it - the code found 177 links. I can't say why it didn't find https://www.uniba.it/ateneo/rettorato - other than the link may not exist on that page.


----------



## stefanoste78 (Aug 21, 2018)

I use excel version 2010 on a laptop in Italy. The page exists you can check it too.
Could you make sure that you can extract the links of all the websites listed in column a and make sure that the links are copied in the same row?
Thank you


----------



## tonyyy (Aug 21, 2018)

```
Sub GetAllLinks2()
Application.ScreenUpdating = False
Dim ie As Object, AllHyperlinks As Object, hyper_link As Object
Dim NextCol As Long
Dim url_name As Range

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
On Error GoTo errHandler

For Each url_name In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    If url_name = "" Or Left(url_name, 4) <> "http" Then
        MsgBox "Please enter a valid url."
        ie.Quit
        Exit Sub
    End If
    
    ie.navigate (url_name)
    Do
        DoEvents
    Loop Until ie.ReadyState = 4
    Set AllHyperlinks = ie.document.getElementsByTagName("A")
    
    NextCol = 2
    With ActiveSheet
        For Each hyper_link In AllHyperlinks
            .Cells(url_name.Row, NextCol).Value = hyper_link
            NextCol = NextCol + 1
        Next
        .Columns.AutoFit
    End With
Next url_name

Application.ScreenUpdating = True
errHandler:
    ie.Quit
End Sub
```


----------



## stefanoste78 (Aug 23, 2018)

Thank you for your contribution.
About the macro I would like to know:
1) also works with other version of excel or windows? (about your question)
2) extracts only the site links that have the same path as the website or even external links? In case you could make sure that you also extract external links?


Then I created a similar post for the extraction of emails that you would most likely be able to solve in a moment as it is similar to this.
If you feel like trying, I am attaching the link below:


https://www.mrexcel.com/forum/excel...ting-emails-web-page-links-listed-column.html


Thanks for all


----------



## stefanoste78 (Sep 2, 2018)

are there any solutions?


----------



## Fennek (Sep 2, 2018)

Hello,

try this:


```
Sub Read_Linksb()
  With CreateObject("MSXML2.XMLHTTP")

URL = "https://www.MrExcel.com"
    .Open "Get", URL, False
    .send
    c00 = .responseText
  End With
       
  With CreateObject("htmlfile")
    .Body.innerhtml = c00

'img
 Set myImg = .getElementsByTagName("img")
        For Each Lk In myImg
            'If Left(Lk.src, 4) = "http" Then
                i = i + 1
                Cells(i, 1) = Lk.src
            'End If
        Next Lk

'links
    For Each it In .Links
    i = i + 1
          cells(i, 1) = it.href
    Next
  End With

End Sub
```

regards


----------



## stefanoste78 (Sep 2, 2018)

Thank you for your intervention.

I can not express myself on the speed between your macros but I can say that even this does not extract all the links.

You can take the test with this link: https://www.uniba.it
you'll see that the macro does not extract these:
https://www.uniba.it/organizzazione/amm-centrale/rettorato/rettorato
https://www.uniba.it/ateneo/rettorato/rettorato-home


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (May 2, 2019)

Hello. Is it possible to slightly modify this macro in such a way that it extrapolates for each link listed in the column the link of the next level to the first that contains the word indicated as column header for each heading inserted?
Thank you


----------



## stefanoste78 (May 3, 2019)

Good evening.
Can you tell me if it is possible to use the macro to be able to extrapolate for each url indicated in the column to only the links that have within them the string indicated in the column header (from column b onwards)?
Thank you


----------



## Worf (May 4, 2019)

Hello

Place the desired string on the second row of each column.


```
Sub Read_Linksb()
Dim c00, myimg, i%, Lk, it, lr%, c%
With CreateObject("MSXML2.XMLHTTP")
    .Open "Get", "https://www.MrExcel.com", False
    .Send
    c00 = .responseText
End With
Cells(1, 1) = "List"
With CreateObject("htmlfile")
    .Body.innerHTML = c00
    i = 1
    Set myimg = .getElementsByTagName("img")
    For Each Lk In myimg
        i = i + 1
        Cells(i, 1) = Lk.src
    Next
    For Each it In .Links
        i = i + 1
        Cells(i, 1) = it.href
    Next
End With
For c = 2 To Cells(2, Columns.Count).End(xlToLeft).Column
    Cells(1, c) = Cells(1, 1)
    Cells(2, c) = "*" & Cells(2, c) & "*"
    Range("a1:a" & Range("a" & Rows.Count).End(xlUp).Row).AdvancedFilter _
    xlFilterCopy, Range(Cells(1, c), Cells(2, c)), Cells(3, c), False
Next
End Sub
```


----------



## stefanoste78 (May 5, 2019)

I tried the macro and noticed that the site must be inserted in the code. I would like the macro to automatically extract links for all the urls indicated in the column. I enclose an example.

https://www.dropbox.com/s/x0hvee4ab03on4h/research link.xlsm?dl=0


----------



## Worf (May 5, 2019)

Buonasera


```
Sub Read_Linksb()
Dim c00, myimg, i%, Lk, it, lr%, c%, nu%
i = 1
For nu = 3 To Range("b" & Rows.Count).End(xlUp).Row ' start at B3
    With CreateObject("MSXML2.XMLHTTP")
        .Open "Get", Cells(nu, 2), False
        .Send
        c00 = .responseText
    End With
    Cells(1, 1) = "List"
    With CreateObject("htmlfile")
        .Body.innerHTML = c00
        Set myimg = .getElementsByTagName("img")
        For Each Lk In myimg
            i = i + 1
            Cells(i, 1) = Lk.src
        Next
        For Each it In .Links
            i = i + 1
            Cells(i, 1) = it.href
        Next
    End With
Next
For c = 3 To Cells(2, Columns.Count).End(xlToLeft).Column
    Cells(1, c) = Cells(1, 1)
    Cells(2, c) = "*" & Cells(2, c) & "*"
    Range("a1:a" & Range("a" & Rows.Count).End(xlUp).Row).AdvancedFilter _
    2, Range(Cells(1, c), Cells(2, c)), Cells(3, c), False
Next
End Sub
```


----------



## stefanoste78 (May 6, 2019)

I tried the macro. In essence it extrapolates links as the previous macro and then transfers only the links that contain the string in the header.
You could avoid the links coming out in the "a" column because if the urls are so many then the excel sheet would not be enough. At most you could transfer all the links extracted in the other sheet occupying the first column then the second etc.
Then, at the moment the macro is started, the domain level of the links to be transferred can be established (for example, only the links of the second or third level, etc.).


My final intention is to extract from the link the emails of the page to which the column titles refer.


Thank you


----------



## Worf (May 6, 2019)

I am not sure if I understand what you want. Do you need to explore all site levels?
This would mean accessing all subpages, until exhausting all internal site links.
It should be possible to list all site links, including subpages.


----------



## stefanoste78 (May 6, 2019)

It would be useful to have them all and subsequently to be able to decide what link the link inserts into the dedicated column at the correspondence of the string given.
So there will be: 1) the extraction of all links;
2) selection by the user of what link insertion (if the maximum level of domains and 6 and user chosen 2 in the columns will only be inserted into link to the second level).
Could you? Of course, the user could write it from 1 to Num Massimo.


----------



## stefanoste78 (May 8, 2019)

Hi wolf. Is difficolt to do it?


----------



## Worf (May 8, 2019)

I would say it is tricky. I am having a busy week but will work on this as soon as possible...


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (May 8, 2019)

Worf said:


> I would say it is tricky. I am having a busy week but will work on this as soon as possible...



Thank you


----------



## Worf (May 9, 2019)

This code seems to work, listing the links up to the chosen level. However, it is inefficient because it generates multiple duplicated results. I will improve it, probably creating a collection of unique items.


```
Dim r%, s$
Sub Request2(URL$, Optional Level% = 1)
Dim n&, ob, oreq As New XMLHTTP60, odoc As New HTMLDocument
oreq.Open "GET", URL, False
oreq.Send
odoc.Body.innerHTML = oreq.responseText
r = r + 1
Cells(r, 1) = Level
Set ob = odoc.getElementsByTagName("a")
For n = 0 To ob.Length - 1
    r = r + 1
    Cells(r, 1) = ob.Item(n).href
Next
If Level Then
    For n = 0 To ob.Length - 1
        If ob.Length > 0 Then
            If ob.Item(n).href Like s Then Request2 ob.Item(n).href, Level - 1
        End If
    Next
End If
End Sub


Sub Main2()                             ' run me
r = 0
s = "*salento*"                         ' to avoid external references
On Error Resume Next
Request2 "https://www.unisalento.it", 5
MsgBox "done"
End Sub
```


----------



## stefanoste78 (May 9, 2019)

I do not understand, however, that it would be enough to extract all the links, as it currently does and then transfer to the table with the headings only those links (depending on the choice 2,3,4 level etc.) whose level is determined by the number of times of character "/" present in the link.
The problem perhaps is to make sure that the links end up in column "b" when all the cells of the column "a" then fill in the "c" etc.


----------



## Worf (May 10, 2019)

This is a better version; it loads 331 pages, finding 5101 unique addresses for the chosen example.
I will be back later to finish the project.


```
Dim r&, s$, dict As Object


Sub Request22(URL$, Optional Level% = 1)
Dim n&, ob, oreq As New XMLHTTP60, odoc As New HTMLDocument
oreq.Open "GET", URL, False
On Error Resume Next
oreq.Send
On Error GoTo 0
odoc.Body.innerHTML = oreq.responseText
r = r + 1
Cells(r, 1) = Level
Set ob = odoc.getElementsByTagName("a")
For n = 0 To ob.Length - 1
    r = r + 1
    Cells(r, 1) = ob.Item(n).href
Next
If Level Then
    For n = 0 To ob.Length - 1
        If ob.Length > 0 Then
            If ob.Item(n).href Like s And Not dict.Exists(ob.Item(n).href) Then
                Request22 ob.Item(n).href, Level - 1
                dict.Add ob.Item(n).href, ob.Item(n).href
            End If
        End If
    Next
End If
End Sub


Sub Main22()                             ' run me
Set dict = CreateObject("Scripting.Dictionary")
r = 0
s = "*salento*"                         ' to avoid external references
Request22 "https://www.unisalento.it"
Cells(1, 2) = Cells(1, 1)
Cells(2, 2) = "*"
[a:a].AdvancedFilter xlFilterCopy, Range("b1:b2"), Cells(1, 3), True
MsgBox dict.Count & " recursive calls.", , "done"
End Sub
```


----------



## Worf (May 11, 2019)

The following example shows how to filter the data, considering the level limit at cell E3 and the desired word at cell E4.
The raw data should be on column C, and the filtered result appears on column H.


```
Sub AdvFilt2()
[e1] = ""
[e2].Formula = "=AND(LEN(C2)-LEN(SUBSTITUTE(C2,""/"",""""))<$E$3,NOT(ISERR(FIND($E$4,C2))))"
[c:c].AdvancedFilter xlFilterCopy, [e1:e2], [h1], True
End Sub
```


----------



## stefanoste78 (May 12, 2019)

Should the last code be inserted in another form? then where should the keywords and urls be entered with this code?


----------



## Worf (May 13, 2019)

I will prepare a consolidated version that extracts all links and filters this list multiple times based on supplied criteria.


----------



## stefanoste78 (May 13, 2019)

Worf said:


> I will prepare a consolidated version that extracts all links and filters this list multiple times based on supplied criteria.



 I'd like to show you a macro that can be combined with this macro. Just to make you participate.
I'm gonna write you my email here? It seems you can't get private messages.


----------



## Worf (May 13, 2019)

Features of this version:



Column A = all site links
Column B = filter criteria
Column C = unique links
Column D = first result column
Column E = second result column
 
Place the level limit at the third row (D3, E3…) and the desired word at the fourth row (D4, E4…)


```
Dim r&, s$, dict As Object

Sub Main22()                             ' run me
Set dict = CreateObject("Scripting.Dictionary")
r = 0
s = "*salento*"                         ' to avoid external references
Request22 "https://www.unisalento.it"
Cells(1, 2) = Cells(1, 1)
Cells(2, 2) = "*"
[a:a].AdvancedFilter xlFilterCopy, Range("b1:b2"), Cells(1, 3), True
AdvFilt2
MsgBox dict.Count & " recursive calls.", , "done"
End Sub

Sub Request22(URL$, Optional Level% = 1)
Dim n&, ob, oreq As New XMLHTTP60, odoc As New HTMLDocument
oreq.Open "GET", URL, False
On Error Resume Next
oreq.Send
On Error GoTo 0
odoc.Body.innerHTML = oreq.responseText
r = r + 1
Cells(r, 1) = Level
Set ob = odoc.getElementsByTagName("a")
For n = 0 To ob.Length - 1
    r = r + 1
    Cells(r, 1) = ob.Item(n).href
Next
If Level Then
    For n = 0 To ob.Length - 1
        If ob.Length > 0 Then
            If ob.Item(n).href Like s And Not dict.Exists(ob.Item(n).href) Then
                Request22 ob.Item(n).href, Level - 1
                dict.Add ob.Item(n).href, ob.Item(n).href
            End If
        End If
    Next
End If
End Sub

Sub AdvFilt2()
Dim r As Range, a, c%
Set r = [d2]
For c = 4 To Cells(3, Columns.Count).End(xlToLeft).Column
    a = Split(r.Address, "$")
    r.Offset(-1) = ""
    r.Formula = "=AND(LEN(c2)-LEN(SUBSTITUTE(c2,""/"",""""))<$" & a(1) & _
    "$3,NOT(ISERR(FIND($" & a(1) & "$4,c2))))"
    [c:c].AdvancedFilter xlFilterCopy, Range(r.Offset(-1), r), r.Offset(4), True
    Set r = r.Offset(, 1)
Next
End Sub
```


----------



## Worf (May 13, 2019)

My PM Inbox seems to be receiving...


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (May 15, 2019)

If I did well it freezes ..
I wrote you the email on the other post.


----------



## Worf (May 15, 2019)

That code is heavy on the system, it froze on my home computer but completed on my work machine. You need good hardware and fast Internet connection. Try testing it with a simple website that has less pages.
We cannot take our discussions off the forum as per forum rules…


----------



## stefanoste78 (May 15, 2019)

Okay, I'll try tomorrow


----------



## stefanoste78 (Jun 7, 2019)

Hi Worf. I had some problems.
As soon as I succeed in the weekend I will attach an excel file here with your macro to understand where vertical and horizontal header data should be placed.
See you soon


----------



## stefanoste78 (Jun 9, 2019)

Worf said:


> Features of this version:
> 
> 
> 
> ...



Hello. I arranged the data as you indicated.
I didn't understand what should be put in column b.
As well as being prepared does not work.
could you insert all the links extracted in sheet 2 of the excel file and do the processing of each link individually passing to the next one when the previous one ends to avoid blocking?

https://www.dropbox.com/s/q4thl1jt8ky44r8/links .xlsm?dl=0


----------



## Worf (Jun 12, 2019)

I rewrote the main routine, see below. It works with the active sheet.
Columns A, B and C are populated by the code; you have only to inform the level (D3, E3) and the desired word (D4, E4). See example below, results are displayed on columns D and E.
What exactly do you call a level in this context?
The site address is currently hardcoded in a variable; would you like to input it via worksheet cells?
This example made only 10 recursive calls; I suggest you test with this site before proceeding to larger ones.
As written, this version processes only one address.


```
Dim r&, s$, dict As Object

Sub Main22()                             ' run me
Dim orig$, v
Set dict = CreateObject("Scripting.Dictionary")
r = 1
[a1] = "List"
orig = "https://poestories.com"         ' root site
v = Split(Replace(orig, "//", "/"), "/")
s = "*" & Split(v(1), ".")(0) & "*"
MsgBox "Safe word is " & s              ' to avoid external references
Request22 orig, 3
Cells(1, 2) = Cells(1, 1)
Cells(2, 2) = "*"
[a:a].AdvancedFilter xlFilterCopy, Range("b1:b2"), Cells(1, 3), True
AdvFilt2
MsgBox dict.Count & " recursive calls.", , "done"
End Sub
```

Sheet range on next post...


----------



## Worf (Jun 12, 2019)




----------



## stefanoste78 (Jul 25, 2022)

hi guys.
Can I ask for changes to this code?
I await your reply.
Thank you


----------



## Worf (Jul 25, 2022)

Hi

If it is a continuation of this subject, we can remain here.

If it would be a new topic, a fresh thread would be the best idea.

What changes do you need?


----------



## stefanoste78 (Jul 26, 2022)

I would like to insert in the file called "links" the code that was inserted I think in post 6 which is covered in file 1.
In file 1 the macro extrapolates all the links of each website and these are inserted horizontally.
After doing this in sheet 1 of the links file, I would like to insert in sheet 2 the links that within them fully include the words of row 3 from column d onwards.
finally in sheet 3 the macro must automatically create the columns as I set them by referring to the words to be searched that I will insert from time to time in sheet 2.
is it possible to insert a button in each sheet which performs the single operations per sheet as I described?
The "Regxp Emails2" file seems to have the code that does the same operation as sheet 1 and 2 in relation to the link of the facebook page, I think it also contains the code to extract the email and telephone number.
I am also attaching the code for etraplare the name of the site that I kindly ask you to insert in the code.









						forum mrexcel (1)
					

Shared with Dropbox




					www.dropbox.com


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (Jul 28, 2022)

Worf said:


> Hi
> 
> If it is a continuation of this subject, we can remain here.
> 
> ...


Hi Worf. Would you help me with the posts of post number 40?


----------



## Worf (Jul 31, 2022)

Hi

I will work on it as soon as possible...


----------



## stefanoste78 (Jul 31, 2022)

Worf said:


> Hi
> 
> I will work on it as soon as possible...


the code contained in the attached file seems hidden. I could see it because at a certain point it crashed and I clicked on the debug button 
Hope you can help me.
Thank you


----------



## Worf (Jul 31, 2022)

The folder contains four files. Which file are you referring to?


----------



## stefanoste78 (Aug 1, 2022)

Worf said:


> The folder contains four files. Which file are you referring to?


The file to use is: links
In this file you need to insert the file code: Scraping_title
that will have to extract the name of the website and insert it in column "b" of sheet 2.
In sheet 1 of the link file it is necessary to insert the code of file 1 to extract all the links of each url.
Then I found another file which is: Regxp Emails2
This last file has the hidden code and it appeared to me when I debugged it following a macro error.
Which of the two macros (file 1 and Regxp Emails2 file) in your opinion extracts all the links more effectively and efficiently? You can use the better code of the two to do this.
The Regxp Emails2 file makes the example of extracting the link of the facebook page, instead I would like to extract in addition to this link to be inserted in sheet 2 of the links file (column d) also the other links that contain the word indicated in the column header of sheet 2 (from column d onwards).
For column c I will need the regex code that I already have and that I will have to use it the same way I use it to extract the email.


----------



## stefanoste78 (Aug 4, 2022)

Worf said:


> The folder contains four files. Which file are you referring to?


Good morning Worf Are there any news? Thank you


----------



## Worf (Aug 6, 2022)

Hello

I shall have some code by tomorrow.


----------



## stefanoste78 (Aug 7, 2022)

Worf said:


> Hello
> 
> I shall have some code by tomorrow.


thanks


----------



## Worf (Aug 7, 2022)

The code below:


Lists all links for the sites mentioned at sheet2. The list appears at sheet1.
Writes the site title
Lists the results that match the words inserted at sheet2


```
Private Sub CommandButton1_Click()
 'Columns for both tables
    Const colUrl As Long = 1      'Must always be the first column
    Const colmail As Long = 2     'Must always be the first column before Some platforms
    Const colFacebook As Long = 3 'Must always be the last column of Some platforms
    Const colError As Long = 4    'Must always be the last column

    Dim url As String, http As Object, htmlDoc As Object, nodeAllLinks As Object
    Dim nodeOneLink As Object, pageLoadSuccessful As Boolean
    Dim tbl_url_oal As String, tbl_all As String, currentRowTableUrls As Long, lastRowTableUrls&
    Dim currentRowsTableAll(colUrl To colFacebook) As Long
    Dim lastRowTableAll As Long, addressCounters(colmail To colFacebook) As Long
    Dim checkCounters As Long, cel As Range
   
    tbl_url_oal = "foglio2"             'Name of Sheet
    currentRowTableUrls = 2           'First row for content
    tbl_all = "Sheet1"     'Name of Sheet
    Sheets(tbl_url_oal).Activate
    With New XMLHTTP60
            On Error Resume Next
        For Each cel In Range("A4:A" & Cells(Rows.Count, 1).End(xlUp).Row)
            .Open "GET", cel.Value, False
            .send
            If .Status = 200 Then cel.Offset(, 1).Value = Split(Split(.responseText, "<title>")(1), "</")(0)
        Next
    End With
    For checkCounters = colUrl To colFacebook
        currentRowsTableAll(checkCounters) = 2   'First rows for content
    Next checkCounters
    Set htmlDoc = CreateObject("htmlfile")
    Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
  
    'Delete all rows except headline in the sheet with all addresses
    lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
    Sheets(tbl_all).Rows(currentRowsTableAll(colUrl) & ":" & lastRowTableAll).Delete Shift:=xlUp
 
    'Loop over all URLs in column A in the URL source sheet
    Do While Sheets(tbl_url_oal).Cells(currentRowTableUrls, 1).Value <> ""
        'Scroll for visual monitoring, if 'the sheet with the URLs are the
        If ActiveSheet.Name = tbl_url_oal Then
            If currentRowTableUrls > 14 Then
                ActiveWindow.SmallScroll down:=1
            End If
            Sheets(tbl_url_oal).Cells(currentRowTableUrls, 1).Select
        End If
   
        'Get next url from the URL source sheet
        url = Sheets(tbl_url_oal).Cells(currentRowTableUrls, colUrl).Value
        'Try to load page 'Temporarily disable error handling if 'there is a timeout or onother error
        On Error Resume Next
        http.Open "GET", url, False
        http.send
   
        'Check if page loading was successful
        If Err.Number = 0 Then
            pageLoadSuccessful = True
        End If
        On Error GoTo 0
   
        If pageLoadSuccessful Then
            'Build html document for DOM operations
            htmlDoc.body.innerHTML = http.responseText
            'Create node list from all links of the page
            Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
            'Walk through all links of the node list
     
    For Each nodeOneLink In nodeAllLinks
             
DoEvents

            'Write mail address to both tables
            Sheets(tbl_url_oal).Cells(currentRowTableUrls, colmail).Value = Right(nodeOneLink.href, Len(nodeOneLink.href) - InStr(nodeOneLink.href, ":"))
            Sheets(tbl_all).Cells(currentRowsTableAll(colmail), colmail).Value = Right(nodeOneLink.href, Len(nodeOneLink.href) - InStr(nodeOneLink.href, ":"))
            'Check if it is a new line in the sheet with all addresses
    If currentRowsTableAll(colmail) >= currentRowsTableAll(colUrl) Then
            'Write URL in the new line of the sheet with all addresses
            Sheets(tbl_all).Cells(currentRowsTableAll(colUrl), colUrl).Value = url
            'Increment url counter
            currentRowsTableAll(colUrl) = currentRowsTableAll(colUrl) + 1
    End If
            'Increment mail counters
             currentRowsTableAll(colmail) = currentRowsTableAll(colmail) + 1
            addressCounters(colmail) = addressCounters(colmail) + 1
     
Next nodeOneLink

        'Check address counters
        For checkCounters = colmail To colFacebook
        'Set comment if more than 1 link were found
        If addressCounters(checkCounters) > 1 Then
        End If
Next checkCounters
        Else
  
        End If
   
        'Prepare for next page
        pageLoadSuccessful = False
        Erase addressCounters
        lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
        For checkCounters = colUrl To colFacebook
            currentRowsTableAll(checkCounters) = lastRowTableAll + 1 'First rows for next page content
        Next checkCounters
        currentRowTableUrls = currentRowTableUrls + 1
    Loop
    'Clean up
Set http = Nothing: Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set nodeOneLink = Nothing
Dim r As Range, c
Sheets("Sheet1").Activate
[b1] = "header"
[c1] = [b1]
For c = 4 To Sheets("Foglio2").[3:3].Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
    [c2] = "*" & Sheets("Foglio2").Cells(3, c) & "*"
    Columns("B:B").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Range("c1:c2"), CopyToRange:=[e1], Unique:=True
    Set r = [e1].CurrentRegion
    Set r = Range(Cells(2, 5), Cells(r.Rows.Count, 5))
    If Len([e2]) Then r.Copy Sheets("Foglio2").Cells(4, c)
    r.Delete
Next
End Sub
```


----------



## stefanoste78 (Aug 8, 2022)

Worf said:


> The code below:
> 
> 
> Lists all links for the sites mentioned at sheet2. The list appears at sheet1.
> ...


Thank you for your interview. I could ask you to insert it in the dropbox link file along with the other code with the creation of the buttons that activates each code. Then I noticed that activating the code there could be some problems with certificates that are asked that block the macro until I click on the "yes" of the mask that appears. Is it possible to automatically accept and save the data automatically and then restart the macro from the first raw line? Then I would like to learn how to insert the email extraction part included in that code.


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (Aug 8, 2022)

stefanoste78 said:


> Thank you for your interview. I could ask you to insert it in the dropbox link file along with the other code with the creation of the buttons that activates each code. Then I noticed that activating the code there could be some problems with certificates that are asked that block the macro until I click on the "yes" of the mask that appears. Is it possible to automatically accept and save the data automatically and then restart the macro from the first raw line? Then I would like to learn how to insert the email extraction part included in that code.


I intend to be able to attach here the complete file of the code and of the buttons inside the sheet to activate the macros individually


----------



## Worf (Aug 13, 2022)

I am having no time to work on this…

When you mention accepting macros, are you going to run the code only on your own computer or distribute it to other people?


----------



## stefanoste78 (Aug 13, 2022)

Worf said:


> I am having no time to work on this…
> 
> When you mention accepting macros, are you going to run the code only on your own computer or distribute it to other people?


Only on mine.
There will be a translation problem, sorry.
I'll explain what I would like to say:
If I start the macro I would like the processed data to be knew so if there should be problems that block the macro I will not lose the processes processed up to that moment.
Subsequently the macro must be able to resume from the interrupted point


----------



## Worf (Aug 17, 2022)

The code below makes the URL bold after processing it, and on the next execution will only work on non-bold entries.
Where would you like to place the email list?


```
Sub CommandButton12()
    Const colUrl As Long = 1      'Must always be the first column
    Const colFacebook As Long = 3 'Must always be the last column of Some platforms
    Const colError As Long = 4    'Must always be the last column

    Dim url As String, http As Object, htmlDoc As Object, nodeAllLinks As Object
    Dim onelink As Object, pageLoadSuccessful As Boolean
    Dim tbl_url_oal As String, tbl_all As String, cr_tbl_urls As Long, lastRowTableUrls&
    Dim cr_table_all(1 To 3) As Long, lastRowTableAll&, addressCounters&(2 To colFacebook)
    Dim checkCounters As Long, cel As Range
   
    tbl_url_oal = "foglio2"             'Name of Sheet
    cr_tbl_urls = 4           'First row for content
    tbl_all = "Sheet1"     'Name of Sheet
    Sheets(tbl_url_oal).Activate
    With New XMLHTTP60
            On Error Resume Next
        For Each cel In Range("A4:A" & Cells(Rows.Count, 1).End(xlUp).Row)
            .Open "GET", cel.Value, False
            .send
            If .Status = 200 Then cel.Offset(, 1).Value = Split(Split(.responseText, "<title>")(1), "</")(0)
        Next
    End With
    For checkCounters = colUrl To colFacebook
        cr_table_all(checkCounters) = 2   'First rows for content
    Next
    Set htmlDoc = CreateObject("htmlfile")
    Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    'Loop over all URLs in column A in the URL source sheet
    Do While Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Value <> ""
    If Not Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold Then
        If ActiveSheet.Name = tbl_url_oal Then
        Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Activate
        End If
        url = Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1)
        On Error Resume Next
        http.Open "GET", url, False
        http.send
       If Err.Number = 0 Then
            pageLoadSuccessful = True
        End If
        On Error GoTo 0
        If pageLoadSuccessful Then
            htmlDoc.body.innerHTML = http.responseText
            Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
    cr_table_all(1) = Sheets(tbl_all).Range("a" & Rows.Count).End(xlUp).Row + 1
 
For Each onelink In nodeAllLinks
DoEvents
Sheets(tbl_all).Cells(cr_table_all(1), 2) = Right(onelink.href, Len(onelink.href) - InStr(onelink.href, ":"))
Sheets(tbl_all).Cells(cr_table_all(1), 1) = url
cr_table_all(1) = cr_table_all(1) + 1
addressCounters(2) = addressCounters(2) + 1
Next
        Else
          End If
        pageLoadSuccessful = False
        Erase addressCounters
        lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
        For checkCounters = colUrl To colFacebook
            cr_table_all(checkCounters) = lastRowTableAll + 1 'First rows for next page content
        Next
         Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold = True
    End If
        cr_tbl_urls = cr_tbl_urls + 1
Loop
Set http = Nothing: Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set onelink = Nothing
Dim r As Range, c
Sheets("Sheet1").Activate
[b1] = "header"
[c1] = [b1]
For c = 4 To Sheets("Foglio2").[3:3].Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
    [c2] = "*" & Sheets("Foglio2").Cells(3, c) & "*"
    Columns("B:B").AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=Range("c1:c2"), CopyToRange:=[e1], Unique:=True
    Set r = [e1].CurrentRegion
    Set r = Range(Cells(2, 5), Cells(r.Rows.Count, 5))
    If Len([e2]) Then r.Copy Sheets("Foglio2").Cells(4, c)
    r.Delete
Next
End Sub
```


----------



## stefanoste78 (Aug 18, 2022)

Worf said:


> The code below makes the URL bold after processing it, and on the next execution will only work on non-bold entries.
> Where would you like to place the email list?
> 
> 
> ...


if you do not create problems, could you insert the code in the file that I have attached with the code to extract the links inside, so you can try it directly with the creation of the buttons?
I wouldn't want to make mistakes if I do it myself ..


----------



## Worf (Aug 20, 2022)

Links saturday.xlsm
					

Shared with Dropbox




					www.dropbox.com
				




This file has a button that runs the code.


----------



## stefanoste78 (Aug 20, 2022)

Worf said:


> Links saturday.xlsm
> 
> 
> Shared with Dropbox
> ...


Hello.
I opened the file and there is no button.
Could you insert a button for each function?


----------



## Peter_SSs (Aug 20, 2022)

stefanoste78 said:


> I opened the file and there is no button.


I don't think that you looked well enough. 
Have a better look around on Foglio2


----------



## stefanoste78 (Aug 21, 2022)

Good evening.
I apologize for not seeing the button.
I changed the files in the dropbox originally shared link.
I try to explain myself better.
Opening the folder you will find the file link. In the first sheet there is the code that extrapolates the links by line. It is possible to update the code to ensure that the extraction takes place by line with saving before moving on to the next one and when the macro is started it must start from the first line not yet examined.
In sheet two the code must insert the links of sheet 1 in relation to the word contained in the column header. I need this to find the links that contain the contacts to extract. In this sheet for the link column of the facebook page, if it is not considered useful to copy the link of sheet 1 there is the code contained in the "Regxp Emails2" dropbox folder from which you can copy the criteria used for the extraction dfel link and apply it to the sheet also with separate code.
For the extraction of the site name there is the other macro present in the dropbox folder to add to the code.
finally, for sheet 3, the macro based on the links inserted in sheet 2 extrapolates: emails and telephone numbers and inserts them in the appropriate column. If the macro contained in "Regxp Emails2" for the email extraction part uses the regex code I can insert it here otherwise I have the code that extracts emails and telephone numbers using the regex code









						forum mrexcel (1)
					

Shared with Dropbox




					www.dropbox.com


----------



## Worf (Aug 23, 2022)

> Could you insert a button for each function?


Please clarify how many buttons you want, where they should be placed and what each one is supposed to do.


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (Aug 23, 2022)

Worf said:


> I entered the cells highlighted in gray for you.
> Would it be possible to proceed step by step?
> Initially you might:
> 1) make sure that in sheet 1 the macro (contained in the file and which extracts all the links of the website for each line) does not process all the lines together but that the processing takes place one line at a time with saving and the possibility of blocking the macro. When I start the macro at a later time it must resume from the first unprocessed line.
> ...


----------



## Worf (Aug 26, 2022)

I will work on it during the weekend.


----------



## stefanoste78 (Aug 26, 2022)

Worf said:


> I will work on it during the weekend.


Thanks


----------



## Worf (Aug 27, 2022)

The code below is for the button at sheet 1. It makes the URL bold after processing it, and on the next execution will only work on non-bold entries.


```
Private Sub CommandButton1_Click()
Const colUrl As Long = 1      'Must always be the first column
Dim url$, http As Object, htmlDoc As Object, nodeAllLinks As Object, onelink As Object, loadOK As Boolean, _
tbl_url_oal$, tbl_all As String, cr_tbl_urls As Long, lastRowTableUrls&, cr_table_all(1 To 3) As Long, _
lastRowTableAll&, cc As Long, cel As Range
tbl_url_oal = "foglio1"     'Name of Sheet
cr_tbl_urls = 2           'First row for content
tbl_all = "foglio1"
Sheets(tbl_url_oal).Activate
For cc = colUrl To 3
    cr_table_all(cc) = 2
Next
Set htmlDoc = CreateObject("htmlfile")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Do While Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1) <> ""
    If Not Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold Then
        If ActiveSheet.Name = tbl_url_oal Then Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Activate
        url = Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1)
        On Error Resume Next
        http.Open "GET", url, False
        http.send
        If Err.Number = 0 Then loadOK = True
        On Error GoTo 0
        If loadOK Then
            htmlDoc.body.innerHTML = http.responseText
            Set nodeAllLinks = htmlDoc.getElementsByTagName("a")
            cr_table_all(1) = Sheets(tbl_all).Range("b" & Rows.Count).End(xlUp).Row + 1
            For Each onelink In nodeAllLinks
                DoEvents
                Sheets(tbl_all).Cells(cr_table_all(1), 2) = _
                Right(onelink.href, Len(onelink.href) - InStr(onelink.href, ":"))
                Sheets(tbl_all).Cells(cr_table_all(1), 3) = url
                cr_table_all(1) = cr_table_all(1) + 1
            Next
        End If
        loadOK = False
        lastRowTableAll = Sheets(tbl_all).Cells(Rows.Count, colUrl).End(xlUp).Row
        For cc = colUrl To 3
            cr_table_all(cc) = lastRowTableAll + 1
        Next
        Sheets(tbl_url_oal).Cells(cr_tbl_urls, 1).Font.Bold = True
    End If
    cr_tbl_urls = cr_tbl_urls + 1
Loop
Set http = Nothing: Set htmlDoc = Nothing
Set nodeAllLinks = Nothing
Set onelink = Nothing
End Sub
```


----------



## stefanoste78 (Aug 27, 2022)

Worf said:


> The code below is for the button at sheet 1. It makes the URL bold after processing it, and on the next execution will only work on non-bold entries.
> 
> 
> ```
> ...


Excuse me if I ask you again.
You could enter the file with the code inside.
Thank you


----------



## Worf (Aug 30, 2022)

I think you should try to create the button yourself, here is the procedure:


Ribbon > developer > Insert > ActiveX command button
Draw the button > right click > view code > paste the code
Ribbon > exit design mode > click button to run code.


----------



## Worf (Aug 30, 2022)

This is the code for the second button:


```
Private Sub CommandButton1_Click()
Dim fog As Worksheet, c%, r As Range
Rem code for foglio 2
Set fog = Sheets("foglio1")
fog.[b1] = "header"
fog.[d1] = fog.[b1]
For c = 2 To Me.[2:2].Find("*", SearchOrder:=xlByColumns, SearchDirection:=2).Column
    fog.[d2] = "*" & Me.Cells(2, c) & "*"
    fog.[b:b].AdvancedFilter Action:=xlFilterCopy, CriteriaRange:=fog.Range("d1:d2"), _
    CopyToRange:=fog.[f1], Unique:=True
    Set r = [f1].CurrentRegion
    Set r = fog.Range(fog.Cells(2, 6), fog.Cells(r.Rows.Count, 6))
    If Len(fog.[f2]) Then r.Copy Me.Cells(3, c)
    r.Delete
Next
End Sub
```


----------



## stefanoste78 (Aug 31, 2022)

Worf said:


> This is the code for the second button:
> 
> 
> ```
> ...


thank you


----------



## Worf (Sep 2, 2022)

The third button:


```
Rem foglio 3
Private Sub CommandButton1_Click()
Dim cel As Range
With New XMLHTTP60
    On Error Resume Next
    For Each cel In Me.Range("A4:A" & Me.Cells(Rows.Count, 1).End(xlUp).Row)
        .Open "GET", cel.Value, False
        .send
        If .Status = 200 Then cel.Offset(, 1) = _
        Split(Split(.responseText, "<title>")(1), "</")(0)
    Next
End With
On Error GoTo 0
End Sub
```


----------



## Worf (Sep 4, 2022)

The email code:


```
Rem foglio 3
Private Sub CommandButton2_Click()
Dim htmldoc As Object, http, url$, loadok, col As Collection, i%, j%, laste%
laste = 4
Set htmldoc = CreateObject("htmlfile")
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
For i = 4 To Me.Range("a" & Me.Rows.Count).End(xlUp).Row
    url = Me.Cells(i, 1)
    On Error Resume Next
    http.Open "GET", url, False
    http.send
    If Err.Number = 0 Then loadok = True
    On Error GoTo 0
    If loadok Then
        htmldoc.body.innerHTML = http.responseText
        Set col = GMAd(htmldoc)
        For j = 1 To col.Count
            Me.Cells(laste, 5) = col(j)
            laste = laste + 1
        Next
    End If
Next
End Sub

Function GMAd(ByVal hdoc As Object) As Collection
Dim opcol As Collection, emm As Object, mfound As Object, regEx As Object
Set opcol = New Collection
Set regEx = CreateObject("VBScript.RegExp")
With regEx
    .Pattern = "([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|" & _
    "(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)"
    .Global = True
    Set emm = .Execute(hdoc.body.innerHTML)
End With
For Each mfound In emm
    On Error Resume Next
    opcol.Add mfound.Value, Key:=mfound.Value
    On Error GoTo 0
Next
Set GMAd = opcol
End Function
```


----------



## stefanoste78 (Aug 19, 2018)

I found this macro:

https://stackoverflow.com/questions/39222082/vba-to-find-text-from-webpages

I  did not understand if it is possible to extract all the links of a web  page or if it is possible to extract all the links of a website.

I would like to extract all the links of the pages of a website.

It could?

Thank you


----------



## stefanoste78 (Sep 4, 2022)

Worf said:


> The email code:
> 
> 
> ```
> ...


I'm trying to figure out how to put all these codes in the form and assign each one the start button.
It will take a bit 'of time 
thanks for now


----------



## Worf (Sep 4, 2022)

It is important that you try.
You will not always find people to prepare ready to use workbooks...


----------



## stefanoste78 (Sep 5, 2022)

Worf said:


> It is important that you try.
> You will not always find people to prepare ready to use workbooks...


Right


----------

