data fields
You asked.
Here are the 97 fields with descriptions. The data formats are immutable. I get them through something called Project Retrosheet, who distributes the Event Files freely, just asking that they are credited when used.
While I haven't included any actual data, I think you can tell from what I have included that only areas that lend themselves to relational tables are those that contain the alphanumeric identifiers for players (combination of letters from last and first names plus numerics to distinguish players with sme last and first names--ramin002 for Manny Ramirez as opposed to ramin001 for Mario Ramirez) and three-letter team identifiers. The rest of the fields are flags. I do have a master roster of players for the period covered (1957-2006) in the form of: retroID,LastName,FirstName. And I obviously have a table for team identifiers.
I do want to try and clarify one question about database size and Access. Suppose that I convert the three data tables (AL, NL, and ML) to backends in Access and they each are pushing 2 GB in size, will the Access frontend be able to handle a query that involves linking the three tables? Or, will Access frontend only care that the amount of data returned by the query is less than 2 GB?
If you still need actual data, we'll have to figure out a way but it's really too much to post here.
Thanks,
Cliff
gameID-----XXXyyyymmddG (where XXX is the 3-letter team identifier, date, then G=0,1,2 to sort out first and second games of doubleheaders)--not unique
vis------3-letter identifier for visiting team (such as BOS for Boston)
inn------integer representing inning (less than, greater than, equals, etc.)
bat------0,1 (flag for home or visiting team batting)
outs-----0,1,2 (outs at time of event)
balls----0,1,2,3
strikes--0,1,2
pitch_seq----alphanumeric of varying length
vis_score----visiting team's score at time of event
home_score---home team's score at time of event
batter------------alphanumeric identifier for batter
batter_hand-------L,R (for batted left-/right-handed)
res_batter--------alphanumeric identifier for batter (NOTE: the batter who starts the plate appearance may not be the one who finishes it--injury, for example)
res_batter_hand---L,R (for batted left-/right-handed)
pitcher-----------alphanumeric identifier for batter
pitcher_hand------L,R (for batted left-/right-handed)
res_pitcher-------alphanumeric identifier for batter (NOTE: ditto res-batter note)
res_pitcher_hand--L,R (for batted left-/right-handed)
catcher--------alphanumeric identifier for defensive player at position
first_base-----""
second_base----"" alphanumeric data of varying lengths
third_base-----""
shortstop------""
left_field-----""
center_field---""
right_field----""
runner_1b------alphanumeric identifier for baserunner on that base at start of event (may be NULL)
runner_2b------""
runner_3b------""
event_text-----alphanumeric data of varying lengths
leadoff_flag---T,F (to identify first batter of inning)
pinchhit_flag--T,F (to identify whether batter is counted as a pinch-hitter)
defensive_position--1-11 (which defensive position batter plays, includes DH and PH)
lineup_position-----1-9, which line-up position batter occupies
event_type----------0-24 (codes for various events)
batter_event_flag---T,F (flag for batter events--hits, etc. as opposed to non-batter events (stolen bases, etc.)
ab_flag-------------T,F (flag to indicate wheter batter is credited with official at bat)
hit_value-----------0,1,2,3,4 (0 for no hit, 4 for home run, etc.)
SH_flag-------------T,F (flag for batter credited with sacrifice hit)
SF_flag-------------T,F (flag for batter credited with sacrifice fly)
outs_on_play--------0,1,2,3 (number of outs recorded on play)
double_play_flag----T,F (flag for double-play recorded)
triple_play_flag----T,F (flag for triple-play recorded)
RBI_on_play---------0,1,2,3,4 (number of runs batted in on play)
wild_pitch_flag-----T,F (flag for pitcher charged with wild pitch)
passed_ball_flag----T,F (flag for catcher charged with passed ball)
fielded_by----------alphanumeric identifier for defensive player who fielded ball
batted_ball_type----F,L,P,G (flags for fly balls, line frives, Pop-uops, ground balls)
bunt_flag-----------T,F (flag indicating whether batter bunted ball)
foul_flag-----------T,F (flag indicating whether ball was in foul or fair territory when fielded)
hit_location--------alphanumeric (corresponds to zones on a map--not always recorded)
num_errors----------number of errors charged on a play
error_1st_player----1-9 position for defensive player who was charged with first error on play (May be NULL)
error_1st_type------T,F (flag for Throw or Fielding, may be NULL)
error_2nd_player----1-9 position for defensive player who was charged with first error on play (May be NULL)
error_2nd_type------T,F (flag for Throw or Fielding, may be NULL)
error_3rd_player----1-9 position for defensive player who was charged with first error on play (May be NULL)
error_3rd_type------T,F (flag for Throw or Fielding, may be NULL)
dest_batter---------0,1,2,3,4 (base batter reached at conclusion of play)
dest_runner_1st-----0,1,2,3,4 (base runner on given the base reached at conclusion of play, 0=out, 4=scored)
dest_runner_2nd-----""
dest_runner_3rd-----""
play_batter---------numeric shorthand describing play made on batter (may be NULL)
play_runner_1st-----numeric shorthand describing play made on runner from the given base (may be NULL)
play_runner_2nd-----""
play_runner_3rd-----""
SB_runner_1st_flag--T,F (flag indicating whether runner on the given base was credited with a stolen base, may be NULL)
SB_runner_2nd_flag--""
SB_runner_3rd_flag--""
CS_runner_1st_flag--T,F (flag indicating whether runner on the given base was charged with a caught stealing, may be NULL)
CS_runner_2nd_flag--""
CS_runner_3rd_flag--""
PO_runner_1st_flag--T,F (flag indicating whether runner on the given base was put out, may be NULL)
PO_runner_2nd_flag--""
PO_runner_3rd_flag--""
Res_pitcher_runner_1st---alphanumeric identifier for pitcher responsible for runner on given base, may be NULL)
Res_pitcher_runner_2nd---""
Res_pitcher_runner_3rd---""
New_Game_Flag---T,F (flag to new game beginning)
End_Game_Flag---T,F (flag to indicate end of game)
Pinch_runner_1st---T,F (flag to indicate whether a pinch runner was used for runner at given base)
Pinch_runner_2nd---T,F (flag to indicate whether a pinch runner was used for runner at given base)
Pinch_runner_3rd---T,F (flag to indicate whether a pinch runner was used for runner at given base)
Runner_removed_pinch-runner_1st---alphanumeric identifier for runner removed from game at given base)
Runner_removed_pinch-runner_2nd---"
Runner_removed_pinch-runner_3rd---"
Batter_removed_pinch-hitter------------alphanumeric identifier of batter removed for pinch-hitter, may be NULL)
Position_batter_removed_pinch-hitter---1-11 (fielding position of batter removed for pinch-hitter)
Fielder_First_Putout---1-9 (fielding position of fielder credited with given putout, may be NULL)
Fielder_Second_Putout--""
Fielder_Third_Putout---""
Fielder_First Assist---1-9 (fielding position of fielder credited with given assist, may be NULL)
Fielder_Second Assist--"
Fielder_Third Assist---"
Fielder_Fourth Assist--"
Fielder_Fifth Assist---"
event_num---sequentil numbering of events in game, reset to 1 when New_Game_Flag=1