Open, SUMIFS, Close, Rename >5k workbooks (~3GB of data)

xd9813

New Member
Joined
Mar 9, 2014
Messages
7
Office Version
  1. 365
  2. 2021
Platform
  1. Windows
  2. Mobile
  3. Web
Objective:
Provide aggregate sales totals based on multiple criteria in Product Description column from >5K workbooks located within different folders & with different layouts, sheet names, and workbook names. However, all column names are similar (not 100% but close). The names of the workbooks are important as they have the contract name, fiscal year, fiscal quarter, & vendor name. The sales totals need to be grouped by based on the following variables:
  1. Contract Name
  2. Fiscal Year
  3. Fiscal Quarter
  4. Vendor Name
  5. Multiple Criteria in Product Description column (no fixed sheet, no fixed range)

Logic Flow:

  1. Open each workbook, get layout info (file name, sheet count, sheet names, product description range, & sales total range). If workbook file name does not follow convention, then rename the workbook to this format: [Contract Name]_FY[2 digit Fiscal Year]_Q[Fiscal Quarter]_[Vendor Name].xlsx
  2. Using the info above develop two-dimensional String arrays with the SUMIFS() function to include product description criteria & the appropriate ranges for total sales for each contract and Fiscal years
  3. Repeat steps 1 & 2 until all >5k workbooks are processed
  4. In a new workbook, type in references from the step 2. SUMIFS() results must be based on Contracts & Fiscal Quarters found in the workbook file names E.g. add total sales from Contract1_FY2013_Vendor1.xlsx, Contract1_FY2013_Vendor2.xlsx,Contract1_FY2013_Vendor3.xlsx, Contract1_FY2013_Vendor4.xlsx, etc.
  5. Close all workbooks
  6. Save this summary workbook

Problem:
Step 3. SUMIFS() only works if workbooks are open, unlike the sum function. Using the SUM(IF()) is not useful as IF() does not accept wildcards and my product description criteria requires wildcards. In addition, when I trye to keep them all open, I get a run-time error 1004 after 100 odd workbooks. I can't close 'em and I can't leave 'em open.

Ask:
I am just seeking guidance with my logic flow. Is there are better logic flow than the one shown above? I should be able to figure out the code from the suggested logic flow.
 
Last edited by a moderator:

Excel Facts

Format cells as currency
Select range and press Ctrl+Shift+4 to format cells as currency. (Shift 4 is the $ sign).
Hi xd9813,

sounds like you got quite a job on your hands. Given the amount of data you're handling, my approach would probably be the following:
From the Excel where you run your macro: either collect the data in that spreadsheet (if you're handling less then 1 million lines) or write your data to an (Access) database. If I understand your logic correctly, you do two loops: a first loop (step 1,2,3) to collect your categories/references? And then you would use your SUMIF to make your calculations over all the workbooks?

In either Excel or Access, the table structure I could see:
tbl_ProcessedFiles
ID - FileName - FileDirectory - Contract Name - Fiscal Year - Fiscal Quarter - Vendor Name - SheetName - SheetRange
1 - Yourfile.xls - \Main\2013\etc. - Nike - 2013 - 4 - Johnson - Sheet1 - A12:D29

tbl_FileData
ID - all the data you want to store...

If you happen to have some examples of files and the information you are looking for, I'd be happy to help you with a table-structure (you could post or PM a link to a dropbox/skydrive/google drive folder).

Cheers,

Koen
 
Upvote 0
I thought about using Access which, probably is the most apt way of doing this. However, due to a slow/old machine, I decided to dump all my file layout, formula, and range information in a .csv file and use that to generate a Pivot in a new workbook. This is where I encounter my first roadblock, receiving a run-time error 1004. Next roadblock is when I try to use a SUMIF with wildcards and am required by Microsoft to keep all excel files open (KB260415 SUMIF, COUNTIF and COUNTBLANK functions return "#VALUE!" Error) in order for it to return a valid sum. And yes I have also considered & will most probably implement running these file I/O's in batches, but I am curious to know if there is better way to do this...

Thank you.
 
Upvote 0
Hi xd9813,

well, keeping 5000 files open for calculations sounds too crazy to be true to start with... (Poor old PC, must be smoking ;).) So just by that reason alone, the "extract data file by file to a database and work with that" sounds more feasible IMHO. Linking formulae to external workbooks: I hardly do it, but if I do it's maximum 2-3 workbooks, otherwise I'll most probably start building a macro to automate importing (and exporting) information.

Cheers,

Koen
 
Upvote 0
So I took your advice and I am opening 1 workbook, getting the data range and sending it to a Access table. Since they all have different column layouts and column headings, I thought about first creating 1 table per spreadsheet. Using a dummy spreadsheet, I tried to send a sample range of 15 columns and 19 rows to access. Here is the code that does this:
Code:
'Required reference: Visual Basic for Applications
'Required reference: Microsoft Excel 14.0 Object Library
'Required reference: Microsoft Forms 2.0 Object Library
'Required reference: Microsoft Office 14.0 Access database engine Object Library


Public Sub ExportRangeToAccess()


    'This subroutine transfers the range in Sheet1 into Access table
    '12,000 Access tables will be created and the same range will be added to those tables
    
    Dim oSelect As Range                                        'data range
    Dim i As Long                                               'row counter
    Dim j As Long                                               'column counter
    Dim k As Long                                               'table counter
    Dim x As Long                                               'field name counter
    Dim sPath As String                                         'db path
    Dim oDAO As DAO.DBEngine                                    'DAO object
    Dim oDB As DAO.Database                                     'DAO object
    Dim oRS As DAO.Recordset                                    'record set object
    Dim tbl As TableDef                                         'access table object
    Dim fld As Field                                            'access table field object
    Const T = "tbl"                                             'table name prefix
    Dim rowCount As Double                                      'data rows
    Dim columnCount As Double                                   'data columns
    Const TBLCNT = 12000                                        'total # of tables
    Const FSLASH = "/"                                          'status bar related
    
    Application.ScreenUpdating = False                          'speed up execution
    ActiveSheet.Activate                                        'send focus to current sheet
    Set oSelect = ActiveSheet.Range("A1:O19")                   'select the data range
    sPath = "C:\Users\xd9813\Desktop\test.accdb"                'initialize the database path var
    Set oDAO = New DAO.DBEngine                                 'allocate memory for DAO objects
    Set oDB = oDAO.OpenDatabase(sPath)                          'open access dbms
    DoEvents                                                    'send control to OS (GC)
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    '   MAIN - START
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    For k = 1 To TBLCNT                                         'start main loop
        Set tbl = oDB.CreateTableDef(T & k)                     'create a new table
        columnCount = oSelect.Columns.Count                     'get data column count
        For x = 1 To columnCount                                'loop to get column headings from data
            Set fld = tbl.CreateField(oSelect.Cells(1, x), _
                                    dbText, 255)                'create field from column heading
            tbl.Fields.Append fld                               'add field to table
        Next x                                                  'move on to next heading
        oDB.TableDefs.Append tbl                                'add table to database
        oDB.TableDefs.Refresh                                   'refresh the connection obj
        Set oRS = oDB.OpenRecordset(T & k)                      'open recordset for adding data to table
        rowCount = oSelect.Rows.Count                           'get data row count
        For i = 2 To rowCount                                   'loop to transfer data into table
            oRS.AddNew                                          'prepare record set to receive data
            For j = 1 To columnCount                            'loop to add row by row
                oRS.Fields(j - 1).Value = oSelect.Cells(i, j)   'add cell value to record set field
            Next j                                              'move on to next column
            oRS.Update                                          'finish adding entire row
        Next i                                                  'move on to next data row
        DoEvents                                                'transfer control to OS (GC)
        Application.StatusBar = k & FSLASH & TBLCNT             'update Excel statusbar of progress
    Next k                                                      'move on to adding next table
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    '   MAIN - END
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    oRS.Close                                                   'close record set object
    oDB.Close                                                   'close database connection
    Set oRS = Nothing                                           'Garbage Collect (GC)
    Set oDB = Nothing                                           'Garbage Collect (GC)
    Set oDAO = Nothing                                          'Garbage Collect (GC)
    Application.StatusBar = False                               'Reset status bar
    Application.ScreenUpdating = True                           'turn on screen updating
    MsgBox "done", vbInformation                                'display notification to user
End Sub

The issue is that after about 4,600 tables, the execution slows down to a crawl. The actual data is much larger. I would appreciate a few optimization options to speed things up.

Thank you.
 
Upvote 0
Hi xd9813,

why are you putting it in so many different tables? I guess Access doesn't like that many tables and that's why it's slowing down significantly. What I would do is put the detail information in one table, like I described a couple of posts ago. What you then would want to do is not simply copy-pasting the Excel information into that Access table, but pre-processing that information into your "fixed format" table, so you order your data at the same time. If you need help with the coding I'd need some examples of the files you're processing and the information you'll need. Hope that pointer helps a bit.

Cheers,

Koen
 
Upvote 0
It was that refresh statement
Code:
oDB.TableDefs.Refresh
and the fact that I was not closing the connection every 1000 CREATE TABLE Statements. Once I fixed those 2 things, the 12000 tables were created in < 3 mins.

Now as far as the creation of 12000 tables is concerned, I am doing that because the column layout of each of the 12000 spreadsheets is unknown. What is known however, are certain keywords in columns. For example, "Name of Purchasing Entity" column could appear as "Buyer Name" or "Purchaser" or "Purchasing Entity".

So in creating 12000 tables my thinking was to have Access determine where the column in located in any given table. Since I need to extract 3 key columns from each spreadsheet (Buyer Name, Product Description, Total Price) I figured I could use wildcards in my select queries to get the needed columns.

I have not figured out the SQL yet but it should be something like INSERT INTO TBL_MAIN VALUES (SELECT *BUYER*, *Description*, *Total Price* FROM TBL1, SELECT *BUYER*, *Description*, *Total Price* FROM TBL2, ...TBL12000) I am sure I could find the SQL online if I searched. I got this concept of letting the DBMS figure out the column's location from SAS.

Correct me if I am wrong but I don't see an alternative to this... if you do then I am open to suggestions.

Thank you.
 
Upvote 0

Forum statistics

Threads
1,223,243
Messages
6,170,971
Members
452,371
Latest member
Frana

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top