GET All urls that have a specified keyword

Sharid

Well-known Member
Joined
Apr 22, 2007
Messages
1,066
Office Version
  1. 2016
Platform
  1. Windows
Hi

Is there a way of getting certain pages urls from a website using vba.

My following code gets all the urls from a web site page. I enter the domain in L1 and whatever that domain is, it will go to that page and extract all urls.

The problem is

1) It only extracts from that page
2) Not all the urls are correct, it extracts all links.

Code:
Sub EstraiURLdaWeb()


Dim doc As HTMLDocument
Dim output As Object




Set IE = New InternetExplorer
IE.Visible = False
IE.navigate Range("L1")


Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE


Set doc = IE.document
Set output = doc.getElementsByTagName("a")


i = 5
For Each link In output
    Range("A" & i).Value = link
    i = i + 1


Next


MsgBox "All Done!"


End Sub

I need it to go to the domain and then extract All urls that have a specified word in them e.g.

If i placed the url https://www.mrexcel.com/ in sheet1 L1 and then I typed a Specified word in Sheet1 A1 like forum or excel-questions. It would check the domain mrexcel for all urls with that keyword and extract the urls

https://www.mrexcel.com/forum/excel-questions/

thanks for having a look
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
Code:
i = 5
For Each link In output
   [COLOR=#0000cd] IF link.href like "*keyword*" then '<-put your keyword here[/COLOR]
    Range("A" & i).Value = link
    i = i + 1
   [COLOR=#0000cd]END IF[/COLOR]

End Sub
Check out the VB "like" operator for details.
 
Upvote 0
yky

Thanks for this code, it a step in the right direction. I'm not sure if it me doing something wrong or that is just the way the code works.

What I was after was that when the code is run it searches the given url domain for ALL PAGES that contain the keyword in the url.

So if I was searching ebay, and the keyword was "itm" it would pull up ALL urls with the word "itm" in it

https://www.ebay.co.uk/itm/Hellboy-II-The-Golden-Army-4K-Ultra-HD-Blu-ray-UHD/283442469519?epid=17031383947&hash=item41fe7cda8f%3Ag%3AAoUAAOSw0lhcp%7Ec8&_trkparms=%2526rpp_cid%253D5cf00abba81b5e0a265ce7f1

Since this did not do that and large sites have many pages. I put this bit of code in to limit the page search. But I can not get it to work. It keeps giving me an error "Object required". My code is in orange. I also tried to change the keyword to work from a cell, but could not get it to work Sheet3 A1

Code:
Sub EstraiURLdaWeb()

Dim doc As HTMLDocument
Dim output As Object

Set ie = New InternetExplorer
ie.Visible = True
ie.navigate Range("L1")

Do
    DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE

Set doc = ie.document
Set output = doc.getElementsByTagName("a")

i = 5
For Each link In output
    If link.href Like "[COLOR=#0000ff]*itm*[/COLOR]" Then '<-put your keyword here
    Range("A" & i).Value = link
    i = i + 1
   End If

Next
   
        [COLOR=#008000] ' CLICK NEXT PAGE[/COLOR]
         [COLOR=#ff8c00] Do
          If pageNumber >= 5 Then Exit Do 'the first 5 pages
              Set nextPageElement = htmlDoc.getElementById("pages") '
              If nextPageElement Is Nothing Then Exit Do
            
            nextPageElement.Click 'next web page
          Do While ie.Busy Or ie.readyState <> 4
               DoEvents
            Loop
            Application.Wait Now + TimeSerial(0, 0, 5)
            Set htmlDoc = ie.document
            pageNumber = pageNumber + 1
            
        Loop[/COLOR]

MsgBox "All Done!"


End Sub
 
Upvote 0

Forum statistics

Threads
1,223,685
Messages
6,173,828
Members
452,535
Latest member
berdex

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top