I'm in the finance industry, and I need some advice.
I have 100gigs of raw historical option price data in .csv format.
It's essentially 10 years worth of daily closing prices for every single option traded in North America.
When I got the data, each .csv file contained a single day's worth of historical prices (Roughly 12 columns by half a million rows in one worksheet, approx. 15mb).
I compiled these into ten seperate workbooks. One for each of the ten years of data. Each workbook(10gb) contained 250 worksheets, and each worksheet stored 12 columns & half a million rows.
I found that this became too computationally intensive, and my 32gb DDR3 i9 3770 computer takes much time even opening one of those files (Roughly 30 mins).
I am conducting statistical analysis on these prices, and wish to build a model that can simulate my models with virtual money. I plan to have one master analytic file which externally references the database I described above.
I want to build this master file such that I can change a variable or two in my model, click the "run" button/macro, and see the overall outcome of my strategy in virtual dollar terms.
This kind of backtesting/historical simulation is vital to my work, but I have hit a couple technical difficulties.
What is the best way to reference such a database? Should I use Microsoft access? Should I break down the files into (12 x 1 month) instead of (1 x 1 year)?
I just started learning how to use excel last year, and would appreciate your advice in this matter.
Ben
I have 100gigs of raw historical option price data in .csv format.
It's essentially 10 years worth of daily closing prices for every single option traded in North America.
When I got the data, each .csv file contained a single day's worth of historical prices (Roughly 12 columns by half a million rows in one worksheet, approx. 15mb).
I compiled these into ten seperate workbooks. One for each of the ten years of data. Each workbook(10gb) contained 250 worksheets, and each worksheet stored 12 columns & half a million rows.
I found that this became too computationally intensive, and my 32gb DDR3 i9 3770 computer takes much time even opening one of those files (Roughly 30 mins).
I am conducting statistical analysis on these prices, and wish to build a model that can simulate my models with virtual money. I plan to have one master analytic file which externally references the database I described above.
I want to build this master file such that I can change a variable or two in my model, click the "run" button/macro, and see the overall outcome of my strategy in virtual dollar terms.
This kind of backtesting/historical simulation is vital to my work, but I have hit a couple technical difficulties.
What is the best way to reference such a database? Should I use Microsoft access? Should I break down the files into (12 x 1 month) instead of (1 x 1 year)?
I just started learning how to use excel last year, and would appreciate your advice in this matter.
Ben