Email Extracting

Sharid

Well-known Member
Joined
Apr 22, 2007
Messages
1,066
Office Version
  1. 2016
Platform
  1. Windows
My code extracts emails from a website, this bit is fine

It can also extract URLs from a site, the PROBLEM is that it extracts all LINKS and not just the domain name

Also I can only extract one or the other, either emails on their own or Links, when what I want is for it to be something like this, and each new record to go into the next Blank Row.

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Emails
[/TD]
[TD]Urls
[/TD]
[/TR]
[TR]
[TD]Jondoe@gmail.com
[/TD]
[TD]coolsite@mysite.com
[/TD]
[/TR]
[TR]
[TD]janedoe@yahoo.com
[/TD]
[TD]supersite@mysite.com
[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]supersite@mysite.com
[/TD]
[/TR]
[TR]
[TD]Dave@gmail.com
[/TD]
[TD]Coolsite@mysite.com
[/TD]
[/TR]
[TR]
[TD]Joeblogs@hotmail.com
[/TD]
[TD]nicesite@mysite.com
[/TD]
[/TR]
</tbody>[/TABLE]

Code:
Private Sub CommandButton1_Click()

Dim Collection As MSHTML.IHTMLElementCollection
Dim element As Object
Dim i As Integer

i = 1
[COLOR=#008000]'k = 2
[/COLOR]
'Currently it only extracts from 1 url but I can set it to extract from a list
URL = "https://www.dvscommercials.co.uk/"


Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
  XMLHTTP.Open "GET", URL, False
    XMLHTTP.setRequestHeader "Content-type", "text/xml"
      XMLHTTP.send

Set HTML = CreateObject("htmlfile")
  HTML.body.innerHTML = XMLHTTP.responseText
  Set Collection = HTML.getElementsByTagName("a")

For Each element In Collection
 On Error Resume Next

Email = element.href
 If InStr(Email, "@") Then
 [COLOR=#008000] 'If InStr(Email, "@") = 0 Then  [B]'For Extracting All URLs[/B][/COLOR]
  ThisWorkbook.Sheets(1).Cells(i, 1).Value = element.href
 [COLOR=#008000] 'ThisWorkbook.Sheets(1).Cells(k, 2).Value = element.href[/COLOR]
   Email = Replace(Email, "mailto:", "")
   ThisWorkbook.Sheets(1).Cells(i, 1).Value = Email

i = i + 1
End If

[COLOR=#008000]'End If[/COLOR]
Next element

End Sub

Thanks for having a look
 

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
I am really stuck on this one, if anyone can help I would really appreciate it.

Also I have found that it has issues with URLS that are Http it seem to be fine with Https. If someone can just help with the first half that would be great.

Thanks
 
Upvote 0

Forum statistics

Threads
1,223,888
Messages
6,175,212
Members
452,618
Latest member
Tam84

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top