Find URLs from Google based on Keywords in excel

gswann

New Member
Joined
Dec 14, 2018
Messages
5
Hi Guys,

I'm working on a data cleanse project at the minute where I'm trying to fill in as many gaps in our data as possible (around 30k records running at 53% blank fields - Dealing with about 6 years of data migrations and lazy employees not focused on data quality)

One of our fields is the website URL and I'd like to find a way to be able to use excel to search on some keywords and then retrieve the URL that would either be the top search result or the URL behind the website button on the right-hand side of the search results.

The data I can use for the keywords can be as little as the business name or the name, the first line of address, postcode to make sure it narrows down to the correct URL. Naturally, if it doesn't have a URL then the search would return something like "No URL".

I found a solution from Kutools (link) but that doesn't seem to work, I've emailed there support team, however, I'm still waiting on a response :-(

It provides a popup box asking me to highlight the search criteria and then it would then return the page title and also it's URL. Their code is:

Sub xmlHttp()
'Updated by Extendoffice 2018/1/30
Dim xRg As Range
Dim url As String
Dim xRtnStr As String
Dim i As Long, xLastRow As Long
Dim xmlHttp As Object, xHtml As Object, xHtmlLink As Object
On Error Resume Next
Set xRg = Application.InputBox("Please select the keywords you will search in Google:", "KuTools for Excel", Selection.Address, , , , , 8)
If xRg Is Nothing Then Exit Sub
Application.ScreenUpdating = False
xLastRow = xRg.Rows.Count
Set xRg = xRg(1)
For i = 0 To xLastRow - 1
url = "https://www.google.co.in/search?q=" & xRg.Offset(i) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set xmlHttp = CreateObject("MSXML2.serverXMLHTTP")
xmlHttp.Open "GET", url, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
xmlHttp.Send
Set xHtml = CreateObject("htmlfile")
xHtml.body.innerHTML = xmlHttp.responseText
Set xHtmlLink = xHtml.getElementById("rso").getElementsByTagName("H3")(0).getElementsByTagName("a")(0)
xRtnStr = Replace(xHtmlLink.innerHTML, "<EM>", "")
xRtnStr = Replace(xRtnStr, "</EM>", "")
xRg.Offset(i, 1).Value = xRtnStr
xRg.Offset(i, 2).Value = xHtmlLink.href
Next
Application.ScreenUpdating = True
End Sub

Is there something I'm missing, not doing correctly with this code or is there a different workaround that someone else is aware of that will help me complete this task?

Thanks
Gary
 

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.

Forum statistics

Threads
1,224,820
Messages
6,181,159
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top