Hi guys,
Just wanted to ask for a help related to scraping of data from the URL.
I have previously used the following script (function) to extract Article/website title from the URL.
Now, however, I need to extract the date of issue; e.g. when the article was published. Could you please help me to adjust the below code in order to achieve this? I would like to make this applicable to as many URLs as possible so it could do a bulk job (just like the title scraper) - hence not mentioning a particular examples. Would it be necessary to use Regex to make this the most useful?
I am aware that some articles do not have a issue date and that is fine. I would just like to be able to retrieve it where possible.
Many thanks.
The current function (as described above) is as follows:
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
Else
fgetMetaTitle = ""
End If
End Function
Just wanted to ask for a help related to scraping of data from the URL.
I have previously used the following script (function) to extract Article/website title from the URL.
Now, however, I need to extract the date of issue; e.g. when the article was published. Could you please help me to adjust the below code in order to achieve this? I would like to make this applicable to as many URLs as possible so it could do a bulk job (just like the title scraper) - hence not mentioning a particular examples. Would it be necessary to use Regex to make this the most useful?
I am aware that some articles do not have a issue date and that is fine. I would just like to be able to retrieve it where possible.
Many thanks.
The current function (as described above) is as follows:
Function fgetMetaTitle(ByVal strURL) As String
Dim stPnt As Long, x As String
Dim oXH As Object
'Get URL's HTML Source
Set oXH = CreateObject("msxml2.xmlhttp")
With oXH
.Open "get", strURL, False
.send
x = .responseText
End With
Set oXH = Nothing
'Parse HTML Source for Title
If InStr(1, UCase(x), "<TITLE>") Then
stPnt = InStr(1, UCase(x), "<TITLE>") + Len("<TITLE>")
fgetMetaTitle = Mid(x, stPnt, InStr(stPnt, UCase(x), "</TITLE>") - stPnt)
Else
fgetMetaTitle = ""
End If
End Function