Hi Guys,
I'm working on a data cleanse project at the minute where I'm trying to fill in as many gaps in our data as possible (around 30k records running at 53% blank fields - Dealing with about 6 years of data migrations and lazy employees not focused on data quality)
One of our fields is the website URL and I'd like to find a way to be able to use excel to search on some keywords and then retrieve the URL that would either be the top search result or the URL behind the website button on the right-hand side of the search results.
The data I can use for the keywords can be as little as the business name or the name, the first line of address, postcode to make sure it narrows down to the correct URL. Naturally, if it doesn't have a URL then the search would return something like "No URL".
I found a solution from Kutools (link) but that doesn't seem to work, I've emailed there support team, however, I'm still waiting on a response
It provides a popup box asking me to highlight the search criteria and then it would then return the page title and also it's URL. Their code is:
Sub xmlHttp()
'Updated by Extendoffice 2018/1/30
Dim xRg As Range
Dim url As String
Dim xRtnStr As String
Dim i As Long, xLastRow As Long
Dim xmlHttp As Object, xHtml As Object, xHtmlLink As Object
On Error Resume Next
Set xRg = Application.InputBox("Please select the keywords you will search in Google:", "KuTools for Excel", Selection.Address, , , , , 8)
If xRg Is Nothing Then Exit Sub
Application.ScreenUpdating = False
xLastRow = xRg.Rows.Count
Set xRg = xRg(1)
For i = 0 To xLastRow - 1
url = "https://www.google.co.in/search?q=" & xRg.Offset(i) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set xmlHttp = CreateObject("MSXML2.serverXMLHTTP")
xmlHttp.Open "GET", url, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
xmlHttp.Send
Set xHtml = CreateObject("htmlfile")
xHtml.body.innerHTML = xmlHttp.responseText
Set xHtmlLink = xHtml.getElementById("rso").getElementsByTagName("H3")(0).getElementsByTagName("a")(0)
xRtnStr = Replace(xHtmlLink.innerHTML, "<EM>", "")
xRtnStr = Replace(xRtnStr, "</EM>", "")
xRg.Offset(i, 1).Value = xRtnStr
xRg.Offset(i, 2).Value = xHtmlLink.href
Next
Application.ScreenUpdating = True
End Sub
Is there something I'm missing, not doing correctly with this code or is there a different workaround that someone else is aware of that will help me complete this task?
Thanks
Gary
I'm working on a data cleanse project at the minute where I'm trying to fill in as many gaps in our data as possible (around 30k records running at 53% blank fields - Dealing with about 6 years of data migrations and lazy employees not focused on data quality)
One of our fields is the website URL and I'd like to find a way to be able to use excel to search on some keywords and then retrieve the URL that would either be the top search result or the URL behind the website button on the right-hand side of the search results.
The data I can use for the keywords can be as little as the business name or the name, the first line of address, postcode to make sure it narrows down to the correct URL. Naturally, if it doesn't have a URL then the search would return something like "No URL".
I found a solution from Kutools (link) but that doesn't seem to work, I've emailed there support team, however, I'm still waiting on a response
It provides a popup box asking me to highlight the search criteria and then it would then return the page title and also it's URL. Their code is:
Sub xmlHttp()
'Updated by Extendoffice 2018/1/30
Dim xRg As Range
Dim url As String
Dim xRtnStr As String
Dim i As Long, xLastRow As Long
Dim xmlHttp As Object, xHtml As Object, xHtmlLink As Object
On Error Resume Next
Set xRg = Application.InputBox("Please select the keywords you will search in Google:", "KuTools for Excel", Selection.Address, , , , , 8)
If xRg Is Nothing Then Exit Sub
Application.ScreenUpdating = False
xLastRow = xRg.Rows.Count
Set xRg = xRg(1)
For i = 0 To xLastRow - 1
url = "https://www.google.co.in/search?q=" & xRg.Offset(i) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set xmlHttp = CreateObject("MSXML2.serverXMLHTTP")
xmlHttp.Open "GET", url, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
xmlHttp.Send
Set xHtml = CreateObject("htmlfile")
xHtml.body.innerHTML = xmlHttp.responseText
Set xHtmlLink = xHtml.getElementById("rso").getElementsByTagName("H3")(0).getElementsByTagName("a")(0)
xRtnStr = Replace(xHtmlLink.innerHTML, "<EM>", "")
xRtnStr = Replace(xRtnStr, "</EM>", "")
xRg.Offset(i, 1).Value = xRtnStr
xRg.Offset(i, 2).Value = xHtmlLink.href
Next
Application.ScreenUpdating = True
End Sub
Is there something I'm missing, not doing correctly with this code or is there a different workaround that someone else is aware of that will help me complete this task?
Thanks
Gary