Folks,
I am performing some data analysis on a huge (by my standards) amount of data. Basically, the data consists of circa 8 million records (rows), each with about 60 data items (columns). At present, this data is split across circa 10 spreadsheets (each several hundred mb in size), to get around the 1000000 row limitation in excel.
I have already written vba code which performs the analysis satisfactorily on a workbook (once open). Basically, the code loops through record, loads the relevant columns into variables and performs the necessary calculations, before outputting to a 'summary array', which at the end is loaded into a new worksheet.
This works well, but takes about 5 minutes to run. Adding this to the fact that the spreadsheets themselves take a similar period of time to open in the first place, mean that I am looking to make my code more efficient (i.e. quicker to run).
Questions
1 - I believe that it is more efficient to load an entire record (row) into an array and perform the calculations on the array - as opposed to loading individual data items into variables. Is this likely to significantly improve efficiency?
2 - Could I take this one step further and load the entire workbook in to a massive 2D array (e.g. dim RawData(1000000,60)), and then loop through that (in both dimensions) to perform the analysis?
3 - I believe it is possible to extract data from a workbook without actually opening it - assuming that the answer to Questions 1 and 2 are 'yes' (i.e. it is viable to load an entire spreadsheet into a large array), is it feasible (desireable) to do this without actually opening the workbooks?
4 - Finally, if the answer to all of the above questions, could I cycle through the 10 or so workbooks in order (without opening them), ending up with one "humongous" 2D array (8000000 x 60)?
Ultimately, my code works well enough (albeit a bit slowly), so it is not worth spending a lot of time coming up with something very complex. However, if anyone has any thoughts / suggestions of some (relatively) simple code that I might modify to fit my purposes, that would be much appreciated...
Thanks
Bcfaigg
I am performing some data analysis on a huge (by my standards) amount of data. Basically, the data consists of circa 8 million records (rows), each with about 60 data items (columns). At present, this data is split across circa 10 spreadsheets (each several hundred mb in size), to get around the 1000000 row limitation in excel.
I have already written vba code which performs the analysis satisfactorily on a workbook (once open). Basically, the code loops through record, loads the relevant columns into variables and performs the necessary calculations, before outputting to a 'summary array', which at the end is loaded into a new worksheet.
This works well, but takes about 5 minutes to run. Adding this to the fact that the spreadsheets themselves take a similar period of time to open in the first place, mean that I am looking to make my code more efficient (i.e. quicker to run).
Questions
1 - I believe that it is more efficient to load an entire record (row) into an array and perform the calculations on the array - as opposed to loading individual data items into variables. Is this likely to significantly improve efficiency?
2 - Could I take this one step further and load the entire workbook in to a massive 2D array (e.g. dim RawData(1000000,60)), and then loop through that (in both dimensions) to perform the analysis?
3 - I believe it is possible to extract data from a workbook without actually opening it - assuming that the answer to Questions 1 and 2 are 'yes' (i.e. it is viable to load an entire spreadsheet into a large array), is it feasible (desireable) to do this without actually opening the workbooks?
4 - Finally, if the answer to all of the above questions, could I cycle through the 10 or so workbooks in order (without opening them), ending up with one "humongous" 2D array (8000000 x 60)?
Ultimately, my code works well enough (albeit a bit slowly), so it is not worth spending a lot of time coming up with something very complex. However, if anyone has any thoughts / suggestions of some (relatively) simple code that I might modify to fit my purposes, that would be much appreciated...
Thanks
Bcfaigg