Extracting HTML tables with a specific format from Outlook to Excel with VBA

MrHelpcel

New Member
Joined
Jan 21, 2022
Messages
32
Office Version
  1. 365
  2. 2021
  3. 2019
  4. 2016
  5. 2013
  6. 2011
  7. 2010
  8. 2007
  9. 2003 or older
Platform
  1. Windows
Hello all!

I have a question regarding extraction of HTML tables of a specific format (not your standard rows and columns table) from outlook, and to put the corresponding information in row format on an excel file.

Previously, with the help of Domenic on this forum (which I CANNOT THANK ENOUGH for all the help he provided me with) I managed to extract all the conventional HTML tables from a specific outlook subfolder. This resulted in a huge file with all the data recorded in rows per table (which was exactly what I wanted). Please see the link for reference: Outlook/VBA - Extracting tables from all emails in a specific folder

However, there are some HTML tables (sort of) that still haven't been successfully extracted. The reason I add the "sort of" part, is due to the nature of these tables. In the source, they are categorized as tables, but they are not your standard row and column tables. I will provide a screenshot of it (sorry for all the censoring, I don't want to get in trouble with the company!):

HKgrSd0l.jpg



Is there a way to obtain information from this table to into excel?


Would it be possible to write some VBA code to extract the information from the little "Enter text here" fields? I already have some code that helped me in retrieving the data from the previous tables (the standard row and column tables), perhaps it could be modified to grab this data instead?

VBA Code:
Option Explicit

Sub demo()

    Dim oApp As Outlook.Application
    Dim oMapi As Outlook.MAPIFolder
    Dim oMail As Outlook.MailItem
    Dim oItem As Variant
    Dim oHTML As MSHTML.HTMLDocument
    Dim oTable As MSHTML.HTMLTable
    Dim oTables As MSHTML.IHTMLElementCollection
    Dim nextRow As Long
    Dim x As Long
    Dim y As Long
  
    On Error Resume Next
    Set oApp = GetObject(, "Outlook.Application")
    If oApp Is Nothing Then
        Set oApp = CreateObject("Outlook.Application")
        If oApp Is Nothing Then
            MsgBox "Unable to start Outlook!", vbExclamation, "Outlook"
            Exit Sub
        End If
    End If
    On Error GoTo 0
  
    Set oMapi = oApp.GetNamespace("MAPI").Folders("folder1").Folders("folder2").Folders("folder3").Folders("folder4")
  
    nextRow = Cells(Rows.Count, "A").End(xlUp).Row + 1
  
    For Each oItem In oMapi.Items
        If TypeName(oItem) = "MailItem" Then
            Set oMail = oItem
            Set oHTML = New MSHTML.HTMLDocument
            With oHTML
                .Body.innerHTML = oMail.HTMLBody
                Set oTables = .getElementsByTagName("table")
            End With
            For Each oTable In oTables
                For x = 0 To oTable.Rows.Length - 1
                    For y = 0 To oTable.Rows(x).Cells.Length - 1
                        If y = 1 Then
                            Cells(nextRow, "A").Value = oMail.ReceivedTime
                            Cells(nextRow, "B").Value = oMail.Subject
                            Cells(nextRow, "C").Offset(y - 1, x).Value = oTable.Rows(x).Cells(y).innerText
                        End If
                    Next y
                Next x
                nextRow = nextRow + 1
            Next oTable
            Set oHTML = Nothing
            Set oMail = Nothing
        End If
    Next oItem
  
    Set oApp = Nothing
    Set oMapi = Nothing
    Set oMail = Nothing
    Set oHTML = Nothing
    Set oTable = Nothing
    Set oTables = Nothing
  
End Sub

I would be very thankful for any help that you guys can provide!
 
Here is an example of what I would like extracted. Note this is related to the initial email sent; that is, the one at the very bottom of the html code (with Header 1, Header 2, etc). I would only need the information under Headers 3, 4, 5 and 6 (and would like it to be extracted into an excel file as shown below, I can add the headers from row 1 manually myself, but what I would need is basically the multiple emails of that form to be extracted and placed as such, where each row pertains to data of each different email in the folder).

1644395207065.png
1644395270097.png
 
Upvote 0

Excel Facts

Remove leading & trailing spaces
Save as CSV to remove all leading and trailing spaces. It is faster than using TRIM().
Since there could be one or more 2-column tables, we won't be able to refer to the data in the last table by index number. And, all of the relevant information are within tables that have no id or name, nor is their class name unique. This is making it very difficult to code.

However, I did notice that table width for each table within the last table is 739. Maybe we can use the querySelectorAll method of the HTML Document object to select all tables that have a width of 739, and then loop through indices 2 through 5 to retrieve Headers 3, 4, 5, and 7. See if this works for you.

VBA Code:
    Dim oTables As Object
    Set oTables = oHTML.querySelectorAll("table[width='739']")
   
    Dim nextRow As Long
    nextRow = 2
   
    Dim nextCol As Long
    nextCol = 1
   
    'retrieve data for Headers 3, 4, 5, and 6
    Dim i As Long
    For i = 2 To 5
        Cells(nextRow, nextCol).Value = oTables(i).innerText
        nextCol = nextCol + 1
    Next i
 
Upvote 0
Since there could be one or more 2-column tables, we won't be able to refer to the data in the last table by index number. And, all of the relevant information are within tables that have no id or name, nor is their class name unique. This is making it very difficult to code.

However, I did notice that table width for each table within the last table is 739. Maybe we can use the querySelectorAll method of the HTML Document object to select all tables that have a width of 739, and then loop through indices 2 through 5 to retrieve Headers 3, 4, 5, and 7. See if this works for you.

VBA Code:
    Dim oTables As Object
    Set oTables = oHTML.querySelectorAll("table[width='739']")
  
    Dim nextRow As Long
    nextRow = 2
  
    Dim nextCol As Long
    nextCol = 1
  
    'retrieve data for Headers 3, 4, 5, and 6
    Dim i As Long
    For i = 2 To 5
        Cells(nextRow, nextCol).Value = oTables(i).innerText
        nextCol = nextCol + 1
    Next i
Where would I append this code to in the initial VBA code?
 
Upvote 0
You can place it right after the code that retrieves the data from your two-column tables.
 
Upvote 0
I just got to work. I added it after the initial extraction code, but it doesn't seem to be working. It throws a runtime error 91 on the line Cells(nextRow, nextCol).Value = oTables(i).innerText
 
Upvote 0
After taking a closer look, I see that the data for each two-column table is placed on a separate row, not all on the same row. So it looks like the result that you would expect for your sample email would be as follows, correct?

Data 3Data 4Data 5Data 6Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123
Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 123Data 12345Data 123Data 123

Also, to identify any two-column table, it seems to me that we can check for the specific header in the first cell of each table, if the headers are constant. Are they constant? Or should we be identifying these tables by checking whether they contain two columns?

Lastly, I previously mentioned that we can identify the data from the bottom of the email by checking the width of the table. You'll need to confirm that the width for any and all emails will be the same. Otherwise, the code I gave you won't work. Will the width be the same in all emails?
 
Upvote 0
I will need to check for the width tomorrow at work, since I didnt bring my work laptop with me home today. But yes, the headers are constant for the first tables. But those arent an issue, since they were already extravted thanks to your help in another thread. And basically, the display of data would be in the format of Data 3/Data 4/Data 5 in one row in adjacent cells, and then for the next email with that table, it would go to the row underneath, and so on.

The only thing I need is the last table with Data 3/4/5. The previous two tables are already extracted. I just posted them in the html source code to give you an example of an email thread I may receive, but I am only interested (for the purpose of this thread, since you already managed to save me in a previous thread) in the last table.
 
Upvote 0
Were you able to work it out? Or are you still having problems? If the latter, try posting the exact code that you're using so that we can have a look.

Cheers!
 
Upvote 0

Forum statistics

Threads
1,223,882
Messages
6,175,164
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top