Structure of parsing 3k html files

mos

New Member
Joined
May 8, 2011
Messages
15
Good morning,

I have about 3-4k html files, each looking like this
iRA1K.png


Code (could not use Code tag, got f ed up by the html code itself)

idix4s.png


I look for something like this:

iV1is.png



NAO, my question:

What methods do you suggest?

I had in mind
  • using import function and the re-order the tables (disadv: gives 1sheet per html file)
  • parse the html files as text, look for String "Company Name" and get following String bracket content somehoe

what else would be good?

will be a bigger task, at least for me, so I wanna check whats best.

thanks, cheers

mos
 
If you can open the files like that in Excel why not do the whole thing there.

You could have a main workbook for the results and code.

The code would open each individual file, extract the data, transfer it to the main workbook, close the file and move onto the next file.
 
Upvote 0

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
Post keep scrambling code when I added the Test Sub.

Had to make it as below

Code:
Sub test()
MsgBox RegExpFind("Your String", "Your String>(.*?)")
End Sub
 
Upvote 0
If you can open the files like that in Excel why not do the whole thing there.

You could have a main workbook for the results and code.

The code would open each individual file, extract the data, transfer it to the main workbook, close the file and move onto the next file.

Ok..I tried the RegExp with the sourcecode but had too many difficulties, so picked up your suggestion.


Looks like this now:

Code:
Sub fileloop()
    Dim MyDir, strPath, searchstring, fullpath As String
    Dim vaFileName As Variant
    Dim i As Integer
    Dim searchstring As String
    fullpath = "file:///Y:/10%20_Operations/07_Task%20List/ _ 
Workfolder%20mos/GCD/files1/"
    Dim importmatrix As Object
        
    MyDir = ActiveWorkbook.Path ' current path
    strPath = MyDir & "\files1" ' files subdir

    Application.DisplayAlerts = False
    
    Set importmatrix = ActiveWorkbook.Sheets("Tabelle2").Range("A1:B45")
    
    With Application.FileSearch
        .NewSearch
        .LookIn = strPath
        .SearchSubFolders = False
        .Filename = ".htm"

        If .Execute > 0 Then
               For i = 1 To .FoundFiles.Count
               
               ImportHTML (fullpath & (.FoundFiles(i)))
               importmatrix = Sheets("Tabelle2").Range("A1:B45")
               
               Sheets("Overview").Cells(i, 1) = Application.WorksheetFunction. _ 
VLookup("Company Name",  importmatrix, 2, False)
               Sheets("Overview").Cells(i, 2) = Application.WorksheetFunction. _ 
VLookup("Email", importmatrix, 2, False)
               
               Next i
             End If
    End With
    Application.DisplayAlerts = True
End Sub


Code:
Sub ImportHTML(filepathx As String)

Sheets("Tabelle2").Select
    With ActiveSheet.QueryTables.Add(Connection:="FINDER;" & filepathx, _
Destination:=Range("A1"))
        .Name = "gsdl_029"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = "4,5,6,7"
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
End Sub

And now I get an Error in

Code:
With ActiveSheet.QueryTables.Add(Connection:="FINDER;" & filepathx, _
Destination:=Range("A1"))
Run-time error '1004'.
Application-defined or operation-defined error.


Is that of the actual import?

Cause I got an alert while importing manually which I hoped to suppress by
DisplayAlerts=False.


Any ideas appreciated, cheers!
 
Upvote 0

Forum statistics

Threads
1,224,616
Messages
6,179,909
Members
452,949
Latest member
beartooth91

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top