Extract data from HTML

dsmt05

New Member
Joined
Jan 26, 2017
Messages
1
We receive many HTML, up to 500, files each day from the field and we need to be able to extract this data to analyze it in Excel. I have tried doing this in the Data - From Web option is Excel 2016 without luck. Thoughts and advise? I have examples of the data if you need to look at it.
 

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
Hi,

after saving the plain html to a file, the following code extracts the content between and


Code:
Sub iRegex()
sFile = "c:\temp\html.txt"
With CreateObject("scripting.filesystemobject")
    fen = .opentextfile(sFile).readall
    Debug.Print Len(fen)
End With

With CreateObject("vbscript.regexp")
   .Global = True
   .MultiLine = True
   .Pattern = "<p.* p"
     Set RR = .Execute(fen)
   Debug.Print RR.Count
   For i = 0 To RR.Count - 1
        Cells(i + 1, 1) = RR(i)
   Next i
End With
End Sub

I deleted some parts of my code, so you might have to debug it.
If you read directly the content of a website to the variable "Fen", it should work as well.

regards

without code-tag


Sub iRegex()
sFile = "c:\temp\html.txt"
With CreateObject("scripting.filesystemobject")
fen = .opentextfile(sFile).readall
Debug.Print Len(fen)
End With

With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.Pattern = " Set RR = .Execute(fen)
Debug.Print RR.Count
For i = 0 To RR.Count - 1
Cells(i + 1, 1) = RR(i)
Next i
End With

</p.*>
 
Last edited:
Upvote 0
You could use HTMLDocument and related classes to parse the HTML and extract the data. Search for VBA IE automation or createDocumentFromUrl for example code.
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top