Can I do this in Excel or are there too many rows ?

Aston01

New Member
Joined
Oct 22, 2013
Messages
4
I currently have 6k+ separate Excel files all formatted exactly the same which combined could very easily have 12 columns and 20+ million rows.

I am interested in being able to do the following:
  • Merge all of the individual files in one common database
  • Append their source file name at the end of each row
  • Use Text2Columns break down the components of the file name
  • Cross reference time/date fields with another file to determine correct value for a field
    • If Date/Time = Date/Time of Data2 then value = Last Price of Data2
  • Create a simple Pivot Table to sort the above data
  • Export as a cvs file.

I have a basic knowledge of Excel, but this is considerably more rows then I am used to working with and I am not used to merging multiple files or cross referencing data sets. I was hoping someone could possibly point me in the direction as to how to structure the above. I am trying to keep it as simple as possible, but due to the sheer number of rows I am not sure whether I would have to do this in Access (which I have no familiarity with) or if I could stick to Excel. Honestly I would gladly pay someone to help me with this, but I wouldn't even know where to begin to look for that person.

Much appreciated
 

Excel Facts

Bring active cell back into view
Start at A1 and select to A9999 while writing a formula, you can't see A1 anymore. Press Ctrl+Backspace to bring active cell into view.
Welcome to the Board!

While we do not allow people to solicit/advertise business for themselves here, MrExcel does offer consulting services (see: Excel Consultant | Excel Consulting | Microsoft Office Consultants).

However, if you are talking about 20+ million rows, I think you have pretty much ruled out Excel. Each spreadsheet in Excel can only hold slightly over 1 million rows. If you start filling up multiple sheets with a million rows, performance will probably be horrible. I would say use Access, but 20+ million rows may push the limits of Access also (Access has a 2 GB size limit). You would probably have to end up using a real database program like SQL, MySQL, or Oracle.
 
Upvote 0
I looked at the file size of the folder containing all of the individual files and it is currently about 600mb so from that starting point it should still be workable from within Access correct? If I did end up having to use Access to accomplish the above what would be the best way to go about working with all of that data? Would I need to dump it all into a common Access DB and create a pivot table from within there or would using Excel as the front end to link to the Access DB be a better way to go?
 
Upvote 0
Just because the size of the folder containing the files is 600 mb does not necessarily mean that will be the size of the Access database once all the data is imported. Databases tend to bloat and need to be managed/maintained (i.e. doing regular "Compact and Repairs" on the database to keep the size down by purging old deleted records). Even if you are able to store all the data in Access, performing certain tasks on your data could be cumbersome, based on your design and what you need to do (you are going to want to figure out where you need to use indexes on your data to improve performance. A lot of this is "trial and error" (try it and see what works). That is the way of a lot of programming sometimes. Of course, you are also limited by the tools we have at your disposal.

I am really not sure what the best course of action would be. So much of it depends on your needs. If you are going to use Access to hold all the data, if possible I would recommend trying to use Access to do the rest too, instead of involving Excel. Are you simply using pivot tables to sort your data? If that is all, you do not need to use Excel to do that. Access actually has better sorting options than Excel (I do not believe there is any limit to the number of fields you can sort by).

I hope this helps you come up with a "plan of attack". By the way, I am going to move this out to the "Discussion" forum, as it really crosses by Excel and Access (and possibly other programs if others have different suggestions).
 
Upvote 0
I don't really need to do a substantial amount of processing of the data. Essentially what I am trying to accomplish is a way to aggregate a lot of individual files into one large DB so that I can some perform some basic filtering and sorting. From there I would just output a simple csv of the results.

In your opinion would this possibly be a I project within the scope of MrExcel's consulting services ?
 
Upvote 0
It can't hurt to reach out to them and check, though I am not sure how much Access work they do.

If all you really need to do is sorting and filtering on large datasets, I would really be leaning towards Access over Excel. Note that if the data becomes too large, you can store the data in a database program like SQL and just use Access as the front-end.
 
Upvote 0
Rather than combining the files, you could set up a procedure that opens each file in turn, extracts/processes/analyses the data from that file and stores whatever you need in memory, before processing the next file.

I've done that before, using Excel to process ~12 million records, with only the final output being written to a worksheet. Even though each of the 150 files I was processing could have been loaded onto a worksheet: (a) there was no need to do so; (b) doing so would have made the final file unnecessarily large; and (c) the actual data processing & analysis would have taken far, far, longer.
 
Upvote 0

Forum statistics

Threads
1,225,656
Messages
6,186,244
Members
453,343
Latest member
hacigultekin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top