Extract all the links of a website

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
stefanoste78,

You might consider the following...

Code:
Sub GetAllLinks()
Application.ScreenUpdating = False
Dim ie As Object, AllHyperlinks As Object, hyper_link As Object
Dim LastRow As Long, NextRow As Long
Dim url_name As Range

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
Set url_name = ActiveSheet.Range("A1")
If url_name = "" Or Left(url_name, 4) <> "http" Then
    MsgBox "Please enter a valid url."
    Exit Sub
End If

ie.navigate (url_name)
Do
    DoEvents
Loop Until ie.ReadyState = 4
Set AllHyperlinks = ie.Document.getElementsByTagName("A")

NextRow = 1
With ActiveSheet
    .Columns(2).Clear
    For Each hyper_link In AllHyperlinks
        .Cells(NextRow, 2).Value = hyper_link
        NextRow = NextRow + 1
    Next
    LastRow = .Cells(Rows.Count, 2).End(xlUp).Row
    .Cells(1, 2).EntireColumn.AutoFit
End With

AppActivate "MicroSoft Excel"
Application.ScreenUpdating = True
MsgBox "There are " & LastRow & " links."
End Sub

Enter the website address into Range("A1"); the links from that site will populate Column B. Please note you'll need Internet Explorer (does not need to be open) on your Windows pc.

Cheers,

tonyyy
 
Upvote 0
Thanks tony.
when I start the macro it seems that it is loading and after a few moments ago as if it has not been activated and then reload and a mask appears with this message and debug:

run-tine error 5
routine call or argument invalid

debugging:
AppActivate "MicroSoft Excel"


I noticed that it does not extract all the links, in fact if you use this link:
https://www.uniba.it

Do not extrapolate this link:
https://www.uniba.it/ateneo/rettorato

Would you like to insert the links on the same line of the link and be able to insert other links in the "to" column to make the extrapolation process be applied for more links?
 
Upvote 0
run-tine error 5
routine call or argument invalid

debugging:
AppActivate "MicroSoft Excel"

I can't duplicate the error. Are you using a US English version of Excel? On a Windows pc?

You can, I think, simply delete the line "AppActivate "MicroSoft Excel"" - all it does is bring Excel to the forefront of the screen.

As for extracting the links to https://www.uniba.it - the code found 177 links. I can't say why it didn't find https://www.uniba.it/ateneo/rettorato - other than the link may not exist on that page.
 
Upvote 0
I use excel version 2010 on a laptop in Italy. The page exists you can check it too.
Could you make sure that you can extract the links of all the websites listed in column a and make sure that the links are copied in the same row?
Thank you
 
Upvote 0
Code:
Sub GetAllLinks2()
Application.ScreenUpdating = False
Dim ie As Object, AllHyperlinks As Object, hyper_link As Object
Dim NextCol As Long
Dim url_name As Range

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
On Error GoTo errHandler

For Each url_name In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    If url_name = "" Or Left(url_name, 4) <> "http" Then
        MsgBox "Please enter a valid url."
        ie.Quit
        Exit Sub
    End If
    
    ie.navigate (url_name)
    Do
        DoEvents
    Loop Until ie.ReadyState = 4
    Set AllHyperlinks = ie.document.getElementsByTagName("A")
    
    NextCol = 2
    With ActiveSheet
        For Each hyper_link In AllHyperlinks
            .Cells(url_name.Row, NextCol).Value = hyper_link
            NextCol = NextCol + 1
        Next
        .Columns.AutoFit
    End With
Next url_name

Application.ScreenUpdating = True
errHandler:
    ie.Quit
End Sub
 
Upvote 0
Thank you for your contribution.
About the macro I would like to know:
1) also works with other version of excel or windows? (about your question)
2) extracts only the site links that have the same path as the website or even external links? In case you could make sure that you also extract external links?


Then I created a similar post for the extraction of emails that you would most likely be able to solve in a moment as it is similar to this.
If you feel like trying, I am attaching the link below:


https://www.mrexcel.com/forum/excel...ting-emails-web-page-links-listed-column.html


Thanks for all
 
Upvote 0
Hello,

try this:

Code:
Sub Read_Linksb()
  With CreateObject("MSXML2.XMLHTTP")

URL = "https://www.MrExcel.com"
    .Open "Get", URL, False
    .send
    c00 = .responseText
  End With
       
  With CreateObject("htmlfile")
    .Body.innerhtml = c00

'img
 Set myImg = .getElementsByTagName("img")
        For Each Lk In myImg
            'If Left(Lk.src, 4) = "http" Then
                i = i + 1
                Cells(i, 1) = Lk.src
            'End If
        Next Lk

'links
    For Each it In .Links
    i = i + 1
          cells(i, 1) = it.href
    Next
  End With

End Sub

regards
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,178
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top