Extract email addresses from multiple pages of a website

mikebuercart

New Member
Joined
Sep 8, 2016
Messages
2
Hi there, i adated a code found on the net in order to retreive the email addresses contained on each page of a certain website : Comune di Ariccia (RM) - Italia: Informazioni
Those numbers before the index.html are a code for identifying these Italian towns and i have them all in my excel sheet.
My code works fine but i found out that many pages don't show an email address for the towns.

I searched the internet and found another website who does have all the email addresses, even the certified email address (PEC). I tried my code with the new website but it can't work as on the other site. The new website address is this : E-Mail e PEC del comune di Altino

As you can see, this last website shows two email addresses, i would like them both, one in the first email colomn and the other in a second email column.
If you analyze the page you find that email addresses aren't in the mailto format and therefore i cannot search them as link objects as i did with the first site. I also have to exclude some addresses in the html code because a couple are of the website offering this service (and i don't need them obviously)- I just need the first email and the PEC email.

Here is the code i wrote for the forst website but i'm stuck and don't know how to modify it for the second website (which would be better) : Thanks for any help.

HTML:
Sub GetEmail()
  Dim ie As Object, WebText As String, Email As String
  
  Dim righe As Integer
  Dim codiceistat As String
  Dim URL As String
  Dim valoreCella As String
 
  
  'conto le righe nel foglio e le memorizzo in variabile RIGHE
  righe = Sheets("Comuni").UsedRange.Rows.Count
  
  'parto dalla riga due in quanto nella 1 ci sono le intestazioni di colonna che non mi interessano
  riga = Sheets("macro").Range("B2").Value
  If riga = "" Then riga = 2
  
  'ripeto n volte quanto sono le righe nel foglio di excel
  Do Until riga > righe
  
    valoreCella = Sheets("Comuni").Range("B" & riga).Value
    If (valoreCella = "") Then
        
        'formo il codice istat da usare nella query web
        codiceistat = Sheets("Comuni").Range("D" & riga).Value
        URL = "http://www.comuni-italiani.it/" & codiceistat & "/index.html"
        
        Set ie = CreateObject("InternetExplorer.Application")
        ie.navigate URL
        While ie.readyState <> 4
          DoEvents
        Wend
        WebText = ie.document.body.innerhtml
        ie.Quit
        Set ie = Nothing
        Email = GetEmailAddress(WebText)
        
        'se trovo l'indirizzo email sottostante non faccio nulla in quanto significa che per tale Comune non ho trovato un indirizzo
        'email e l'automatismo ha trovato quello del sito da cui sto prelevando i dati.
        
        If Email <> "info@comuni-italiani.it" Then
            Sheets("Comuni").Range("B" & riga).Value = Email
            
        End If
     End If
        'incremento di 1 in modo tale da avanzare nelle righe
        riga = riga + 1
        Set ie = Nothing
  Loop
End Sub

'cerco l'indirizzo email e quando lo trovo lo inserisco in una variabile chiamata Email
Function GetEmailAddress(ByVal S As String) As String
  Dim X As Long, AtSign As Long
  Dim Locale As String, Domain As String
  Locale = "[A-Za-z0-9.!#$%&'*/=?^_`{|}~+-]"
  Domain = "[A-Za-z0-9._-]"
  AtSign = InStr(S, "@")
  If AtSign = 0 Then Exit Function
  For X = AtSign To 1 Step -1
    If Not Mid(" " & S, X, 1) Like Locale Then
      S = Mid(S, X)
      If Left(S, 1) = "." Then S = Mid(S, 2)
      Exit For
    End If
  Next
  AtSign = InStr(S, "@")
  For X = AtSign + 1 To Len(S) + 1
    If Not Mid(S & " ", X, 1) Like Domain Then
      S = Left(S, X - 1)
      If Right(S, 1) = "." Then S = Left(S, Len(S) - 1)
      GetEmailAddress = S
      Exit For
    End If
  Next
End Function
 

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"
OK everyone, i managed to do it on my own.

For anyone wanting to do the same thing (and it's very fast in extracting data), go to my website where i published the solution :
Estrazione email comuni italiani in excel 3 dal sito Italia in dettaglio - Maroso Marco



Hi there, i adated a code found on the net in order to retreive the email addresses contained on each page of a certain website : Comune di Ariccia (RM) - Italia: Informazioni
Those numbers before the index.html are a code for identifying these Italian towns and i have them all in my excel sheet.
My code works fine but i found out that many pages don't show an email address for the towns.

I searched the internet and found another website who does have all the email addresses, even the certified email address (PEC). I tried my code with the new website but it can't work as on the other site. The new website address is this : E-Mail e PEC del comune di Altino

As you can see, this last website shows two email addresses, i would like them both, one in the first email colomn and the other in a second email column.
If you analyze the page you find that email addresses aren't in the mailto format and therefore i cannot search them as link objects as i did with the first site. I also have to exclude some addresses in the html code because a couple are of the website offering this service (and i don't need them obviously)- I just need the first email and the PEC email.

Here is the code i wrote for the forst website but i'm stuck and don't know how to modify it for the second website (which would be better) : Thanks for any help.

HTML:
Sub GetEmail()
  Dim ie As Object, WebText As String, Email As String
  
  Dim righe As Integer
  Dim codiceistat As String
  Dim URL As String
  Dim valoreCella As String
 
  
  'conto le righe nel foglio e le memorizzo in variabile RIGHE
  righe = Sheets("Comuni").UsedRange.Rows.Count
  
  'parto dalla riga due in quanto nella 1 ci sono le intestazioni di colonna che non mi interessano
  riga = Sheets("macro").Range("B2").Value
  If riga = "" Then riga = 2
  
  'ripeto n volte quanto sono le righe nel foglio di excel
  Do Until riga > righe
  
    valoreCella = Sheets("Comuni").Range("B" & riga).Value
    If (valoreCella = "") Then
        
        'formo il codice istat da usare nella query web
        codiceistat = Sheets("Comuni").Range("D" & riga).Value
        URL = "http://www.comuni-italiani.it/" & codiceistat & "/index.html"
        
        Set ie = CreateObject("InternetExplorer.Application")
        ie.navigate URL
        While ie.readyState <> 4
          DoEvents
        Wend
        WebText = ie.document.body.innerhtml
        ie.Quit
        Set ie = Nothing
        Email = GetEmailAddress(WebText)
        
        'se trovo l'indirizzo email sottostante non faccio nulla in quanto significa che per tale Comune non ho trovato un indirizzo
        'email e l'automatismo ha trovato quello del sito da cui sto prelevando i dati.
        
        If Email <> "info@comuni-italiani.it" Then
            Sheets("Comuni").Range("B" & riga).Value = Email
            
        End If
     End If
        'incremento di 1 in modo tale da avanzare nelle righe
        riga = riga + 1
        Set ie = Nothing
  Loop
End Sub

'cerco l'indirizzo email e quando lo trovo lo inserisco in una variabile chiamata Email
Function GetEmailAddress(ByVal S As String) As String
  Dim X As Long, AtSign As Long
  Dim Locale As String, Domain As String
  Locale = "[A-Za-z0-9.!#$%&'*/=?^_`{|}~+-]"
  Domain = "[A-Za-z0-9._-]"
  AtSign = InStr(S, "@")
  If AtSign = 0 Then Exit Function
  For X = AtSign To 1 Step -1
    If Not Mid(" " & S, X, 1) Like Locale Then
      S = Mid(S, X)
      If Left(S, 1) = "." Then S = Mid(S, 2)
      Exit For
    End If
  Next
  AtSign = InStr(S, "@")
  For X = AtSign + 1 To Len(S) + 1
    If Not Mid(S & " ", X, 1) Like Domain Then
      S = Left(S, X - 1)
      If Right(S, 1) = "." Then S = Left(S, Len(S) - 1)
      GetEmailAddress = S
      Exit For
    End If
  Next
End Function
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,917
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top