Reading HTML file, but it messes up the text format.

QuietRiot

Well-known Member
Joined
May 18, 2007
Messages
1,079
Office Version
  1. 365
  2. 2021
Platform
  1. Windows
  2. MacOS
This is the code I'm using below and I've tried innerHTML and innerTEXT, but it removes any line breaks and such so it's just a wall of text instead of nicely formatted paragraphs. Any ideas on how I can accomplish keeping the text in the same format between the tags? When I view the file in notepad++ I can see that they are nicely formatted.



Code:
Sub ScrapeData()

    Dim hDoc As MSHTML.HTMLDocument
    Dim hElem As MSHTML.HTMLGenericElement
    Dim sFile As String, lFile As Long
    Dim sHtml As String
    Dim x As Long
    
    x = 1

    'read in the file
    lFile = FreeFile
    sFile = "C:\Users\test\Desktop\test\htmltest.html"
    Open sFile For Input As lFile
    sHtml = Input$(LOF(lFile), lFile)

    'put into an htmldocument object
    Set hDoc = New MSHTML.HTMLDocument
    hDoc.body.innerHTML = sHtml

    'loop through tags
    For Each hElem In hDoc.getElementsByTagName("FONT")
            Cells(x, 1).Formula = hElem.innerText
            x = x + 1
    Next hElem
    
End Sub
 
Last edited:

Excel Facts

Enter current date or time
Ctrl+: enters current time. Ctrl+; enters current date. Use Ctrl+: Ctrl+; Enter for current date & time.

Forum statistics

Threads
1,223,227
Messages
6,170,848
Members
452,361
Latest member
d3ad3y3

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top