Building a large database help

baseball

Board Regular
Joined
Apr 1, 2002
Messages
153
I have play-by-play data from Project Retrosheet for the vast majority of major league baseball games played from 1952 through 2009. The data has been converted to comma-delimited format with text in quotes. This is a lot of data and I need to put it into an Access 2007 database. For example, the data from 2000 through 2006 is in excess of 1.355 million records, each with 97 fields. Because of the sheer amount of data, I separated the data by decades (50s and 60s, 70s, 80s, 90s, 00s) but the total size of those five text files is about 3.5 GB.

I'm running a fairly robust system (Core 2 Duo under Windows 7, 64-bit) but I only have 4 gb of RAM and I am concerned whether my proposed database is going to be too large to run smoothly. And I have no idea how to figure that out other than trial-and-error.

So, my first question is how to handle this.

Next, I need to know the best way to import the data. I know how to import text; I'm more concerned with the tactics of dealing with the field names and the formatting of the records. For example, I don't know when to use integer, double-precision, etc. My preference would be to make a table with the field names (imported) amd formatting to use as a template, then import the delimited files into the template and save it with a new name.

Is that a reasonable way to approach this? Or should I simply copy and paste the field names at the start of each text file?

When importing field names, do they need to be quotes-delimited?

Once I build the database, would I be better off splitting it?

Finally, I will need to run queries that span decades and I have no idea how one writes a query that pulls data from two different tables at once. For example, if I wanted to check on something in Barry Bonds's career (1986-2007), I would need to pull data from three tables. In the past, I would write different queries. then combine the data in Excel. But that is cumbersome. I might add that I don't write SQL but use Query Design in Access.

I should also mention that because of the nature of the data, this is not a relational database. Here is one record. A lot of what I do is based on filtering the data based on various flags (T/F and some numerical values, i.e., a 4 under Hit_Value for home runs.

"ATL198904070","LAN",1,0,0,0,0,"X",0,0,"randw001","R","randw001","R","glavt001","L","glavt001","L","davij001","perrg001","blauj001","gantr001","thoma001","smitl002","murpd001","berrg001","","","","53/G5","T","F",4,1,2,"T","T",0,"F","F",1,"F","F",0,"F","F",5,"G","F","F","5",0,0,"N",0,"N",0,"N",0,0,0,0,"53","","","","F","F","F","F","F","F","F","F","F","","","","T","F","F","F","F","","","","",0,3,0,0,5,0,0,0,0,1

Anyway, thanks for reading and if someone can help me, I would appreciate it.

Cliff
 

Excel Facts

When they said...
When they said you are going to "Excel at life", they meant you "will be doing Excel your whole life".
Hi Clif,

you should edit your post to add some spaces after commas in this list:
"ATL198904070","LAN",1,...
so that the line will wrap

An Access database is limited to 2g so it sounds like you may have to create more than one database to hold the data. Each of these is a back-end (BE).

You will then use a FE (front-end) to link to the back-end tables and possibly use UNION queries to combine the data for viewing.

" I don't know when to use integer, double-precision, etc.... I have no idea how one writes a query that pulls data from two different tables at once. "

oh, I guess the first part of this post looks like Greek to you. Read Access Basics and watch the video tutorials ... links in my siggy <smile>

<smile></smile>
 
Upvote 0
Crystal,

Thank you very much for the links. I didn't know about the 2 GB size-limit. but that probably explains why I wasn't able to append some data to an existing database. I also didn't know about Union queries since they are only done through SQL and I use Access Query Design to write my queries. I do see how they are used, now.

I guess my big problem will be dividing my data into logical, useable entities.

Sorry about the extended lines. They didn't show that way in my text editor.

Cliff
 
Upvote 0

Forum statistics

Threads
1,221,310
Messages
6,159,176
Members
451,543
Latest member
cesymcox

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top